Vercel Community

[▲ Vercel Community](/) · [Categories](/categories) · [Latest](/latest) · [Top](/top) · [Live](/live)

[Help](/c/help/9)

# Groq AI Gateway Pricing

58 views · 3 likes · 9 posts


Kopter (@kop7er) · 2026-02-11

Hey everyone, 

On the AI Gateway, the pricing for Groq cache seems to be missing. Does it still apply? Another thing is that the pricing tables seem to be broken. I can't scroll horizontally. I need to, like, drag select.

Example for OpenAI GPT-120B.

Vercel:
![image|690x67, 75%](upload://kyZ0kmqYLRzXQcPGdwHIhdfH3Aq.png)

Groq:
![image|690x169, 75%](upload://bhIVOruWOI2r3LkljVgXCr1eYcp.png)

Thank you for the help.


jerilynzheng (@jerilynzheng) · 2026-02-11 · ♥ 2

Hi @kop7er - thanks for flagging this! Caching will still apply, we just added the pricing so it should be visible now in the UI. Horizontal scroll coming as well - stay tuned.


Kopter (@kop7er) · 2026-02-11

Thank you @jerilynzheng. Have other problems with Groq for you, except these ones are actually causing problems in production. I tried opening a ticket, but I don't have a paid account yet, as I'm testing the platform first.

All of these issues occurred when trying `GPT-OSS-120B` and `GPT-OSS-20B`.

* A good chunk of the time, I keep getting a 503 and service unavailable, only with Groq. Seems to be fixed when adding my own key, which defeats the whole purpose. 

* The reasoning tokens count is always 0. Their API returns the correct count, so it's not being correctly extracted by the AI Gateway.

* The `reasoning_effort` option isn't being properly passed down to Groq's API. I noticed that I always had a ton of reasoning when setting it to low. Tried the exact same request, but directly using their API, and got the expected amount. So I tried once again, but this time not configured at all, letting it default, and I got a 1:1 of what I get when using the AI Gateway.

Groq Reasoning Docs: https://console.groq.com/docs/reasoning


Kopter (@kop7er) · 2026-02-11

Sorry @jerilynzheng for keeping on posting new issues here, but all of these are currently happening, and this is the most direct way I have.

Because Groq kept failing, I added GPT-4.1-Mini as a fallback, even though I keep sending the same prompts. Nothing caches at all on OpenAI, while that happens on Groq.

![image|690x363](upload://8Opi1Q6lasHaDI2pyc7wpLDcCTa.png)

![image|572x500](upload://aCyXODCwCMCTd3zc2yJiat3gkJw.png)


*You can also see here the `Reasoning` always being 0.*


Kopter (@kop7er) · 2026-02-12

Marking this as not solved, as even though pricing was updated UI-wise, the actual usage pricing doesn't take it into account.

Request example:
* Prompt Tokens: **1271**
* Cached Tokens: **1024**
* Completion Tokens: **1665**

\| Category	\| Tokens	\| Rate (per 1M)	\| Calculation	\| Cost \|
\| --- \| --- \| --- \| --- \| --- \|
\| Cached Input	\| 1,024	\| $0.07	\| 1024 / 1M * 0.07$	\| $0.00007168 \|
\| Uncached Input	\| 247	\| $0.15	\| 247 / 1M * 0.15$	\| $0.00003705 \|
\| Output	\| 1,665	\| $0.60	\| 1665 / 1M * 0.60$	\| $0.00099900 \|

Expected Cost: $0.00110773
Actual Cost: $0.00126645

Not sure if it's because reasoning is not being correctly counted or not.

For some reason, it's even cheaper to run them uncached
![image|690x86](upload://c5V0rBlI8RwldENtFvpIHr9vMgv.png)


Pauline P. Narvas (@pawlean) · 2026-02-12

Thanks! The team is looking into this :)


jerilynzheng (@jerilynzheng) · 2026-02-12 · ♥ 1

Hi! Groq is a overloaded provider, so if you want to use AI Gateway, it’s better to not restrict to any specific provider in case they do have capacity issues for a specific model. We have fallbacks and multiple providers for this exact reason. What are you using to try out these models - AI SDK or something else? I can help more on the reasoning side if we get more details there.

On the caching side, there are providers that require a minimum prompt size to trigger caching - I believe this is the case with OpenAI.


Kopter (@kop7er) · 2026-02-12

Hey, about the `reasoning_effort`. When you take the exact same body and request a completion, you get 2 very different amount of reasoning. Using Groq's API, it returns the expected amount when set to low, about 15 tokens. Using the AI Gateway, exact same body, you get like 500 tokens and the reasoning text is the same as if you didn't set an effort.

With the AI Gateway, at least on Groq and with the GPT OSS models, setting `reasoning_effort` does not change anything at all.


Kopter (@kop7er) · 2026-02-23

Hey @pawlean @jerilynzheng , sorry for asking, but is there any update on these issues? 

The `reasoning_effort` parameter is also an issue on other providers, like Amazon Bedrock, when using the GPT OSS models.

Thank you.

\| Category	\| Tokens	\| Rate (per 1M)	\| Calculation	\| Cost \|
\| --- \| --- \| --- \| --- \| --- \|
\| Cached Input	\| 1,024	\| $0.07	\| 1024 / 1M * 0.07$	\| $0.00007168 \|
\| Uncached Input	\| 247	\| $0.15	\| 247 / 1M * 0.15$	\| $0.00003705 \|
\| Output	\| 1,665	\| $0.60	\| 1665 / 1M * 0.60$	\| $0.00099900 \|