Per-API-key limits
Primary rate-limiting is per API key, implemented as a sliding window in Redis. Limit varies by plan and any attached quota preset:| Plan | Default RPM | Configurable max (preset) |
|---|---|---|
| Free | 10 | 10 |
| Starter | 30 | 60 |
| Growth | 60 | 120 |
| Pro | 120 | 240 |
| Business | 200 | 400 |
| Scale | 400 | 800 |
| Enterprise | Custom | Custom |
Exceeded response
Retry-After header tells you how many seconds until your window frees up. Honour it.
Daily token budgets
In addition to RPM, some plans cap total tokens per day (rateLimitTpd in your quota preset). Hitting this returns the same 429 with a different message.
Global safety net
We enforce a 5 000 req / 10 min per IP ceiling at the edge to stop scraping. This almost never trips for real users — only poorly-configured crawlers.Best practices
- Always respect
Retry-After. Don’t hammer. - Use exponential backoff with jitter. 1s → 2s → 4s → 8s (+random 0–500 ms) is fine.
- Consider fallback chains. If 429s on
gpt-4omatter, set a fallback togpt-4o-miniorgemini-flash— the gateway handles retry for you. - Use separate keys per environment. Dev traffic with a 30 rpm preset, prod on 400. Never mix.
- Queue on your side too. For batch workloads, implement a local rate limiter that stays below your key’s RPM — that way short bursts don’t get rejected.
Rate limits on specific endpoints
/contact/sales— 5 req / hour per IP (spam protection)/public/plansand/public/models— 30 req / min per IP/v1/filesupload — per-key RPM applies; additionally serialised by workspace (one upload at a time)

