The change
What stays identical
- Endpoint paths:
/v1/chat/completions,/v1/embeddings,/v1/images/generations,/v1/audio/*,/v1/files - Request bodies (messages, tools, response_format, streaming)
- Response shapes (
id,choices,usage,system_fingerprint) - SSE streaming format including the final
data: [DONE] - Tool calling, JSON mode, structured outputs, vision, PDF
- Idempotency keys
- Error envelope (
{ "error": { "type", "code", "message" } })
What’s added
| Feature | How |
|---|---|
| Cost per request | Header x-credits-used, plus a credits_used SSE chunk before [DONE] |
| Multi-provider models | Use any model slug from GET /v1/models — Anthropic, Google, xAI, OSS — with the OpenAI SDK |
| Fallback routing | Configure in dashboard, headers x-model-used / x-fallback-from tell you what served |
| Usage analytics | Per-key, per-model, per-member breakdowns |
What changes
Model slugs. OpenAI models keep their names (gpt-4o, gpt-4o-mini, text-embedding-3-large). For Anthropic/Google/xAI, use the slug from GET /v1/models — for example claude-sonnet-4-5, gemini-2-5-flash, grok-4.
Auth. Use an Infery API key (inf_...) — your OpenAI key is not valid here. Create one in Settings → API Keys.
Rate limits. Per-workspace, not per-OpenAI-org. See Rate limits.
Billing. Single Infery invoice covers every provider. Your OpenAI billing relationship ends.
Checklist
- Create an Infery API key
- Replace
OPENAI_API_KEYwithINFERY_API_KEYin env config - Set
base_url/baseURLtohttps://api.infery.ai/v1 - Run your test suite — nothing else should change
- (Optional) Set up a fallback chain for production resilience
- (Optional) Add
x-credits-usedto your request logging
Things to watch
- Org-level OpenAI features (project keys, fine-tunes, batch API) aren’t 1:1 yet —
batchis on the roadmap. - System fingerprints are passed through from upstream when present, so determinism guarantees match the underlying provider.
- If your code parses error messages by string, switch to
error.code— it’s stable; messages are not.

