Skip to main content
The Infery API is OpenAI-compatible at the wire level. Point the official OpenAI SDK at our gateway and 99 % of your code keeps working.

Base URL

https://api.infery.ai

Versioning

All endpoints are prefixed with /v1. We follow OpenAI’s surface closely so POST /v1/chat/completions, POST /v1/embeddings, POST /v1/images/generations, POST /v1/audio/speech, POST /v1/audio/transcriptions and POST /v1/files all work as you’d expect.

Authentication

Authorization: Bearer inf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Get a key from Settings → API Keys. See Authentication.

Compatibility

Works today (drop-in)
  • Chat completions (streaming + non-streaming, tool calls, JSON mode, vision, multimodal content blocks)
  • Embeddings
  • Images: generations + edits
  • Audio: TTS + STT (STT accepts both multipart and base64 JSON)
  • Files API: upload, list, retrieve, delete, download content
  • Referencing file_id in chat completion content blocks
Gotchas
  • Audio input on Anthropic models — returns 400 audio_not_supported_by_model (Claude doesn’t support audio). Transcribe first with /v1/audio/transcriptions and pass the text.
  • Video generation is async — response returns a job_id, poll GET /v1/videos/generations/:job_id for status.
  • response_format=url on image gen returns a provider-hosted URL (ephemeral, ~1 hour). For persistent URLs, upload the base64 result via POST /v1/files.

Request IDs

Every response includes an x-request-id header. Include this in support tickets for quick lookup.

Useful response headers

HeaderMeaning
x-request-idUUID of the request in our logs
x-credits-usedCost of this request in credits
x-model-usedFinal model that served the request (may differ from requested if fallback fired)
x-fallback-fromOriginal requested model if a fallback ran
x-fallback-depthHow many fallbacks were tried before success
x-storage-used-bytes / x-storage-limit-bytesOn /v1/files — current and max storage

Rate limits

  • Per API key: configurable, typically 30–400 req/min depending on plan and key preset
  • Global safety net: 5 k req / 10 min per IP (DoS protection)
See Rate limits.

Explore endpoints

Chat completions

POST /v1/chat/completions

Embeddings

POST /v1/embeddings

Images

POST /v1/images/generations

Audio

/v1/audio/speech, /v1/audio/transcriptions

Videos

POST /v1/videos/generations (async)

Music

POST /v1/music/generations

Files

POST /v1/files, file_id resolver

Models

GET /v1/models