Skip to main content
What we ship in production. Breaking changes are flagged BREAKING; everything else is additive.

2026-04

  • Files API — OpenAI-compatible POST /v1/files plus file_id references in chat completions. Workspace-scoped, MIME-sniffed, idempotent. See Files API.
  • Playground file access scope — Playground now shares the same workspace file pool as the API; uploads from either side are referenceable from both.
  • Billing role — new Billing workspace role: invoices, payment methods and Playground access, no API key management. See Members and roles.
  • Promo site — fresh marketing site at infery.ai with full legal suite (Privacy, Terms, AUP, Subprocessors).
  • Video polling mirror enqueue fix — generated videos no longer occasionally get stuck in processing state when the upstream provider returns asynchronously.

2026-03

  • Fallback chains — configure per-source-model fallback ladders in Settings → Fallbacks. Headers x-model-used / x-fallback-from / x-fallback-depth on every response. See Fallback chains.
  • Music generationPOST /v1/music/generations with Suno and Udio backends.
  • Budget alerts — email + in-app notifications at 50/75/90% of plan + auto-pause on exhaustion.
  • OpenAI SDK extrascredits_used in the streaming usage chunk, ignored by upstream SDKs but readable by raw parsers.

2026-02

  • Streaming for all chat models — including Anthropic and Google, normalised to OpenAI SSE format.
  • Vision + PDF — automatic PDF-to-image conversion on the gateway for models without native PDF support.
  • Quotas and presets — workspace-level monthly token caps, per-key rate-limit profiles. See Quotas.

2026-01

  • Public launch.
  • OpenAI-compatible chat, embeddings, image, audio, video endpoints.
  • OpenAI, Anthropic, Google, xAI, DeepSeek, Qwen providers.
  • Playground with chat, image, video, audio modalities.
  • Subscription plans + topups via Stripe.

What’s next (tentative)

These are not commitments — see the roadmap for the canonical list.
  • Batch APIPOST /v1/batches for cheap async bulk inference (50% discount on most models).
  • Workspace-level PRC opt-in — replace the support-email gate for Qwen/DeepSeek with a self-serve toggle.
  • Fine-tuning passthrough — submit + monitor OpenAI / Google fine-tune jobs through the Infery key.
  • Realtime API — websocket bidirectional audio for voice agents.
  • EU region — primary processing in europe-west4 with full data residency.

Staying informed

  • RSS of this changelog: https://docs.infery.ai/reference/changelog/rss.xml
  • Email on every minor or major release: opt in at Settings → Notifications → Product updates
  • Status page: status.infery.ai