API overview

The Infery API is OpenAI-compatible at the wire level. Point the official OpenAI SDK at our gateway and 99 % of your code keeps working.

Base URL

https://api.infery.ai

Versioning

All endpoints are prefixed with /v1. We follow OpenAI’s surface closely so POST /v1/chat/completions, POST /v1/embeddings, POST /v1/images/generations, POST /v1/audio/speech, POST /v1/audio/transcriptions and POST /v1/files all work as you’d expect.

Authentication

Authorization: Bearer inf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Get a key from Settings → API Keys. See Authentication.

Compatibility

Works today (drop-in)

Chat completions (streaming + non-streaming, tool calls, JSON mode, vision, multimodal content blocks)
Embeddings
Images: generations + edits
Audio: TTS + STT (STT accepts both multipart and base64 JSON)
Files API: upload, list, retrieve, delete, download content
Referencing file_id in chat completion content blocks

Gotchas

Audio input on Anthropic models — returns 400 audio_not_supported_by_model (Claude doesn’t support audio). Transcribe first with /v1/audio/transcriptions and pass the text.
Video generation is async — response returns a job_id, poll GET /v1/videos/generations/:job_id for status.
response_format=url on image gen returns a provider-hosted URL (ephemeral, ~1 hour). For persistent URLs, upload the base64 result via POST /v1/files.

Request IDs

Every response includes an x-request-id header. Include this in support tickets for quick lookup.

Useful response headers

Header	Meaning
`x-request-id`	UUID of the request in our logs
`x-credits-used`	Cost of this request in credits
`x-model-used`	Final model that served the request (may differ from requested if fallback fired)
`x-fallback-from`	Original requested model if a fallback ran
`x-fallback-depth`	How many fallbacks were tried before success
`x-storage-used-bytes` / `x-storage-limit-bytes`	On `/v1/files` — current and max storage

Rate limits

Per API key: configurable, typically 30–400 req/min depending on plan and key preset
Global safety net: 5 k req / 10 min per IP (DoS protection)

See Rate limits.

Explore endpoints

Chat completions

POST /v1/chat/completions

Embeddings

POST /v1/embeddings

Images

POST /v1/images/generations

Audio

/v1/audio/speech, /v1/audio/transcriptions

Videos

POST /v1/videos/generations (async)

Music

POST /v1/music/generations

Files

POST /v1/files, file_id resolver

Models

GET /v1/models

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

Base URL

Versioning

Authentication

Compatibility

Request IDs

Useful response headers

Rate limits

Explore endpoints

Chat completions

Embeddings

Images

Audio

Videos

Music

Files

Models

Overview

Chat Completions

Embeddings

Images

Audio

Video

Music

Files

Models

​Base URL

​Versioning

​Authentication

​Compatibility

​Request IDs

​Useful response headers

​Rate limits

​Explore endpoints

Chat completions

Embeddings

Images

Audio

Videos

Music

Files

Models

Base URL

Versioning

Authentication

Compatibility

Request IDs

Useful response headers

Rate limits

Explore endpoints