Hi Vercel team,
I’m currently using Vercel AI Gateway on the pay-as-you-go plan, and I’d like to better understand the production rate limits.
Could you clarify the following?
-
What are the current limits for pay-as-you-go accounts regarding:
- Requests per minute (RPM)
- Tokens per minute (TPM)
-
Are these limits applied globally at the account level, or separately per provider/model (OpenAI, Anthropic, Google, etc.)?
-
Do rate limits automatically increase over time based on account usage or spending, similar to OpenAI usage tiers?
-
If an application requires consistently high throughput (for example, serving hundreds of concurrent users), what is the best way to avoid or minimize rate limit issues in production?
Thanks!