The two failure modes
| Code | Meaning | Retry? |
|---|---|---|
429 rate_limit_exceeded | Your key burst above its RPM, or hit the daily token cap | Yes — honour Retry-After |
502 / 503 / 504 | Upstream provider transient | Yes — exponential backoff |
400 / 401 / 403 / 422 | Caller error | No — fix the request |
402 insufficient_credits | Out of balance | No — top up |
Always honour Retry-After
Every 429 from us includes:
python
Exponential backoff with jitter (for 5xx)
For upstream transients withoutRetry-After:
python
+ random()) is critical — without it, every client retries at the same instant and you get a stampede.
Don’t retry 4xx
400/401/403/422 are deterministic — retrying just wastes RPM and money. Fix the request, then resubmit. Common offenders:
- Wrong model slug → check
GET /v1/models - Missing required parameter → check the endpoint reference
- Image too large → resize before resending
- Malformed JSON → fix the producer
SDK-level retries
The OpenAI SDK retries 429s and 5xx automatically:python
Retry-After. For most apps, this is enough — you don’t need a custom loop. But:
- It retries on a single call. Bursty workloads still need a queue.
- It applies to streaming too — the first chunk is what matters.
Stay under the limit on purpose
Reactive retry is the floor. Proactive limiting is the ceiling. Token-bucket on your side, capped at ~80% of the key’s RPM:python
Fallback chains: the better answer for production
Per-call retries help, but fallback chains help more. Configure once:gpt-4o is invisible to your code — the gateway routes to gpt-4o-mini, and you see:
Per-environment keys
Don’t share one key across dev/staging/prod. Reasons:- Dev experiments shouldn’t drain prod’s RPM budget
- Leaked dev keys have lower blast radius if scoped to a low-RPM preset
- Per-env analytics are clearer
Daily token caps
Some plans cap total tokens per day in addition to RPM. Hitting the cap returns 429 too — butRetry-After will be the seconds until midnight UTC, not seconds. Don’t blindly sleep — instead:
- Reduce request volume
- Switch to a cheaper model
- Top up to a higher plan
code: "rate_limit_exceeded" and message: "Daily token cap reached" to distinguish from RPM 429s.
When backoff fails
If you back off twice and still 429:- Look at Settings → Usage → By key — is one key dominating?
- Check Settings → API Keys → preset — is the preset lower than you remember?
- Did you ship a loop without rate limiting? Look at request volume in the last hour.
- Open a ticket — sometimes it’s our problem and we want to know.

