Alibaba (Qwen)

Alibaba Cloud’s Qwen model family available through Infery, covering text, vision, image generation, audio, video, embeddings and reranking.

Alibaba is a PRC-based provider. See our Privacy Policy for data transfer details. PRC providers require explicit opt-in.

Text models

Model	Slug	Context	Max output	Stream	Tools	JSON	Vision	Files
Qwen Flash	`qwen-flash`	1M	32K	✓	✓	✓	—	—
Qwen Plus	`qwen-plus`	1M	32K	✓	✓	✓	—	—
Qwen Turbo	`qwen-turbo`	128K	16K	✓	✓	✓	—	—
Qwen Long	`qwen-long`	10M	32K	✓	—	—	—	—
Qwen VL Plus	`qwen-vl-plus`	128K	8K	✓	—	—	✓	PDF, images, video (10 files, 10 MB)
Qwen VL Max	`qwen-vl-max`	128K	8K	✓	—	—	✓	PDF, images, video (10 files, 10 MB)
Qwen3 Max	`qwen3-max`	256K	32K	✓	✓	✓	✓	PDF, images, video (10 files, 10 MB)
Qwen3 Omni Flash	`qwen3-omni-flash`	64K	16K	✓	—	—	✓	PDF, images, audio, video (10 files)
Qwen3 Coder Plus	`qwen3-coder-plus`	1M	64K	✓	✓	—	—	—
Qwen3.5 Flash	`qwen3-5-flash`	1M	64K	✓	✓	✓	✓	PDF, images, video (10 files, 10 MB)
Qwen3.5 Plus	`qwen3-5-plus`	1M	64K	✓	✓	✓	✓	PDF, images, video (10 files, 10 MB)
Qwen3.5 Omni Plus	`qwen3-5-omni-plus`	256K	32K	✓	—	—	✓	PDF, images, audio, video (10 files)
QwQ Plus	`qwq-plus`	128K	8K	✓	—	—	—	—

Embedding models

Model	Slug	Max input
Qwen Text Embedding v3	`qwen-text-embedding-v3`	8K

Image models

Model	Slug	Sizes	Max N	Aspect ratios	Edits
Qwen Image 2.0 Pro	`qwen-image-2-0-pro`	1024² to 1920×1080	4	1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3	✓ (up to 5 images)
Wan 2.6 Image	`wan2-7-image`	1024² to 1920×1080	4	1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3	✓ (up to 5 images)
Z-Image Turbo	`z-image-turbo`	1024², 1280²	4	1:1, 16:9, 9:16, 4:3, 3:4	✓ (1 image)

Audio models

Text-to-speech

Model	Slug	Output formats
CosyVoice V2	`cosyvoice-v2`	mp3, wav, pcm
CosyVoice V3 Plus	`cosyvoice-v3-plus`	mp3, wav, pcm

Speech-to-text

Model	Slug	Response formats	Max file	Inputs
Paraformer V2	`paraformer-v2`	json, text	500 MB	mp3, wav, flac, ogg, webm, mp4
Qwen3 ASR Flash	`qwen3-asr-flash`	json, text	500 MB	mp3, wav, flac, ogg, webm, mp4

Video models

Model	Slug	Durations	Resolutions	Aspect ratios	Image-to-video
Wan 2.7 Text-to-Video	`wan2-7-t2v`	3, 5, 8, 10s	480p, 720p	16:9, 9:16, 1:1	—
Wan 2.7 Image-to-Video	`wan2-7-i2v`	3, 5, 8, 10s	480p, 720p	16:9, 9:16, 1:1	✓ (1 image)

Rerank models

Model	Slug	Max input
Qwen3 Rerank	`qwen3-rerank`	32K

Get started

Playground

Workspaces

Billing

Models

Guides

Reference

Text models

Embedding models

Image models

Audio models

Text-to-speech

Speech-to-text

Video models

Rerank models

Get started

Playground

Workspaces

Billing

Models

Guides

Reference

​Text models

​Embedding models

​Image models

​Audio models

​Text-to-speech

​Speech-to-text

​Video models

​Rerank models

Text models

Embedding models

Image models

Audio models

Text-to-speech

Speech-to-text

Video models

Rerank models