POST /v1/chat/completions — OpenAI-compatible text and multimodal chat.
file_id references.
"stream": true. Response is Server-Sent Events (text/event-stream). Each chunk is data: {...}\n\n; the stream ends with data: [DONE]\n\n.
The final chunk before [DONE] carries usage info and Infery-specific credits_used:
choices, so credits_used is a non-breaking extension.
content:
textimage_url — HTTP URL or base64 data: URIinput_audio — inline base64 audio with format (wav/mp3/pcm16/webm)file — inline data + mime_type or file_id reference (Files API)file_id to bytes on the server, injects them into the provider call, and returns a clear 400 if the id doesn’t exist or is out of your workspace.
tools, tool_choice, function schema and tool role messages. Every chat-capable model on Infery that supports tools honours the same format.
supportsVision: true accept images directly. For PDFs, models with supportsPdf: true read them natively. Others get an automatic PDF-to-image conversion on the gateway (plus text extraction) — you pay a small extra fee per page (see billing), no code changes required.
temperature, top_p, presence_penalty, frequency_penalty, max_tokens, stop, seed, stream, tools, tool_choice, response_format. Plus model-specific:
top_k — Gemini and some OSS modelsx-request-idx-credits-usedx-model-used (when fallback fires)x-fallback-from, x-fallback-depthAPI key in format: Bearer inf_***
Optional request ID for tracking
Model ID
"gpt-4o"
0 <= x <= 20 <= x <= 1Top-K sampling (Google Gemini)
Presence penalty (OpenAI, Google Gemini)
-2 <= x <= 2Frequency penalty (OpenAI, Google Gemini)
-2 <= x <= 2Seed for deterministic output (OpenAI, Google Gemini)
Chat completion result. When stream=true, returns SSE (text/event-stream) where each data chunk is a chat.completion.chunk; the final chunk before [DONE] carries usage and credits_used.