Use cases
- Primary provider outage — OpenAI 429s during peak hours → fall back to Google Gemini Flash
- Cost-tier progression — try
gpt-4o, on rate-limit step down togpt-4o-mini, thengemini-flash - Capability routing — use a PDF-native model first; if unavailable, use a vision model with our PDF-to-image preprocessor
- Regional — EU customers fall back from an OpenAI model to a Google one hosted in EU
Setting up a chain
Settings → Fallbacks → New chain (or edit an existing one). A chain is attached to a source model slug and lists fallback models in priority order:When fallbacks fire
The gateway steps through fallbacks when the primary (or prior fallback) fails with one of:429 rate_limit_exceeded503 service_unavailable502 bad_gateway- Provider-specific errors tagged as retryable
- Network timeout / connection reset
4xx errors from your code (bad prompt, invalid params, auth failure, quota exceeded).
Transparent to the caller
Client code doesn’t change. The response comes back in OpenAI format as normal, plus an extra header:Cost accounting
The final model that served the request is billed. If fallback to a cheaper model succeeds, you pay the cheaper price. Rate-limit attempts don’t incur cost.Disabling per call
Add headerx-disable-fallback: true on an individual request to force primary-only behaviour (useful for testing which primary is actually up).
