Google’s AI models available through Infery, spanning text, image, audio, video and music generation.
Text models
| Model | Slug | Context | Max output | Stream | Tools | JSON | Vision | Files |
|---|
| Gemini 2.5 Computer Use | gemini-2-5-computer-use | 1M | 64K | ✓ | — | — | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 2.5 Flash | gemini-2-5-flash | 1M | 8K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 2.5 Flash Native Audio | gemini-2-5-flash-native-audio | 1M | 16K | ✓ | — | — | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 2.5 Flash-Lite | gemini-2-5-flash-lite | 1M | 8K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 2.5 Pro | gemini-2-5-pro | 1M | 64K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 3 Flash Preview | gemini-3-flash-preview | 1M | 64K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 3.1 Flash Live | gemini-3.1-flash-live-preview | 128K | 64K | — | — | — | — | PDF, images, audio, video, text, JSON |
| Gemini 3.1 Flash-Lite Preview | gemini-3.1-flash-lite-preview | 1M | 64K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini 3.1 Pro Preview | gemini-3.1-pro-preview | 1M | 64K | ✓ | ✓ | ✓ | ✓ | PDF, images, audio, video, text, JSON |
| Gemini Robotics-ER 1.5 | gemini-robotics-er | 1M | 8K | ✓ | — | — | ✓ | PDF, images, audio, video, text, JSON |
Google models support up to 50 files and 1000 PDF pages per request. Accepted formats: images (JPEG, PNG, GIF, WebP, HEIC), audio (WAV, MP3, OGG, FLAC, WebM, AAC, AIFF, MP4), video (MP4, WebM, MOV), text (plain, HTML, CSV, Markdown), and JSON.
Embedding models
| Model | Slug | Max input | Multimodal |
|---|
| Gemini Embedding | gemini-embedding-001 | 8K | — |
| Gemini Embedding 2 | gemini-embedding-2 | 8K | ✓ (images, audio, video, PDF) |
Image models
| Model | Slug | Sizes | Max N | Aspect ratios | Formats | Person gen | Edits |
|---|
| Imagen 4 | imagen-4 | 1K, 2K | 4 | 1:1, 3:4, 4:3, 9:16, 16:9 | png, jpeg | configurable | ✓ |
| Imagen 4 Fast | imagen-4-fast | — | 4 | 1:1, 3:4, 4:3, 9:16, 16:9 | png, jpeg | configurable | ✓ |
| Imagen 4 Ultra | imagen-4-ultra | 1K, 2K | 4 | 1:1, 3:4, 4:3, 9:16, 16:9 | png, jpeg | configurable | ✓ |
| Nano Banana | gemini-2-5-flash-image | — | — | — | png, jpeg, webp | — | ✓ |
| Nano Banana 2 | gemini-3-1-flash-image | 1K, 2K, 4K | 4 | 1:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2 | png, jpeg, webp | configurable | ✓ |
| Nano Banana Pro | gemini-3-pro-image | 1K, 2K, 4K | 4 | 1:1, 3:4, 4:3, 9:16, 16:9, 1:2, 2:1, 2:3, 3:2 | png, jpeg, webp | configurable | ✓ |
Audio models
Text-to-speech
| Model | Slug | Output formats |
|---|
| Gemini 2.5 Flash TTS | gemini-2-5-flash-tts | mp3, opus, wav, flac |
| Gemini 2.5 Pro TTS | gemini-2-5-pro-tts | mp3, opus, wav, flac |
| Google Cloud TTS | google-cloud-tts | mp3, opus, wav, flac |
Speech-to-text
| Model | Slug | Response formats | Max file | Inputs |
|---|
| Gemini 2.5 Flash STT | gemini-2-5-flash-stt | json, text, srt, verbose_json, vtt | 25 MB | mp3, wav, ogg, flac, webm, mp4, aac, aiff |
Video models
| Model | Slug | Durations | Resolutions | Aspect ratios | Person gen | Image-to-video |
|---|
| Veo 2 | veo-2 | 5, 6, 8s | 720p | 16:9, 9:16 | configurable | ✓ |
| Veo 3 | veo-3 | 4, 6, 8s | 720p, 1080p | 16:9, 9:16 | configurable | ✓ |
| Veo 3 Fast | veo-3-fast | 4, 6, 8s | 720p, 1080p | 16:9, 9:16 | configurable | ✓ |
| Veo 3.1 | veo-3-1 | 4, 6, 8s | 720p, 1080p, 4K | 16:9, 9:16 | configurable | ✓ |
| Veo 3.1 Fast | veo-3-1-fast | 4, 6, 8s | 720p, 1080p, 4K | 16:9, 9:16 | configurable | ✓ |
| Veo 3.1 Lite | veo-3-1-lite | 4, 6, 8s | 720p, 1080p | 16:9, 9:16 | configurable | ✓ (max 2) |
Music models
| Model | Slug | Max duration | Formats | Image input |
|---|
| Lyria 3 Clip | lyria-3-clip | 30s | mp3 | ✓ (up to 10 images) |
| Lyria 3 Pro | lyria-3-pro | 240s | mp3, wav | ✓ (up to 10 images) |