I think there may be an issue with how AI Gateway is billing for google/gemini-3-pro-preview, or I am misunderstanding how the pricing is applied.
According to the published price for this model
Input tokens $2 per million
Output tokens $12 per million
One of my recent calls through Vercel AI Gateway shows the following usage and charge
Provider Google Vertex, model google/gemini-3-pro-preview
Input tokens 1.1K
Output tokens 182
Total charge $0.016054
If I calculate this manually
Input cost, 1,100 ÷ 1,000,000 × 2.00 = $0.0022
Output cost, 182 ÷ 1,000,000 × 12.00 = $0.002184
Total expected cost is therefore about $0.004384.
However I was actually billed $0.016054 for that call, which is roughly 3.6x higher than the expected amount.
I have repeated this with several different requests and each time the charge shown by AI Gateway is around 4x what I would expect from the published per token pricing.
Is this wrong or am I am misunderstanding how the pricing is calculated in this case?
In Vercel AI Gateway this single call is shown as costing $0.0057.
Using the published Gemini rates, if I calculate cost from promptTokenCount plus candidatesTokenCount I get about $0.00038, so the charge is roughly 15x higher than expected!
However, if I treat:
input tokens as promptTokenCount (36)
output tokens as candidatesTokenCount (26) + thoughtsTokenCount (443) = 469
Total tokens: 36 + 459 = 505
and then apply the same rates, the result is exactly $0.0057.
So it looks like AI Gateway is also billing thoughtsTokenCount as output tokens, which explains why all my Gemini charges are coming out at a much higher amount than my own calculations.
Hi @jmcharks, from on top of my head I will say your finding is correct. I think it’ll be same if you make the request directly to the Gemini API.
Nevertheless, I’ve asked our team to confirm if that’s the correct understanding.
Confirmation: The reasoning tokens are billed at output token rates. We don’t have any markups on the Provider’s Model pricing. So, costs will always be same as if you were directly making the request to the provider’s API.