Question About AI Gateway Production Rate Limits

Hi Vercel team,

I’m currently using Vercel AI Gateway on the pay-as-you-go plan, and I’d like to better understand the production rate limits.

Could you clarify the following?

  • What are the current limits for pay-as-you-go accounts regarding:

    • Requests per minute (RPM)
    • Tokens per minute (TPM)
  • Are these limits applied globally at the account level, or separately per provider/model (OpenAI, Anthropic, Google, etc.)?

  • Do rate limits automatically increase over time based on account usage or spending, similar to OpenAI usage tiers?

  • If an application requires consistently high throughput (for example, serving hundreds of concurrent users), what is the best way to avoid or minimize rate limit issues in production?

Thanks!

Also curious about this. Would love to know how the limits work in real production usage.