On the AI Gateway, the pricing for Groq cache seems to be missing. Does it still apply? Another thing is that the pricing tables seem to be broken. I can’t scroll horizontally. I need to, like, drag select.
Hi @kop7er - thanks for flagging this! Caching will still apply, we just added the pricing so it should be visible now in the UI. Horizontal scroll coming as well - stay tuned.
Thank you @jerilynzheng. Have other problems with Groq for you, except these ones are actually causing problems in production. I tried opening a ticket, but I don’t have a paid account yet, as I’m testing the platform first.
All of these issues occurred when trying GPT-OSS-120B and GPT-OSS-20B.
A good chunk of the time, I keep getting a 503 and service unavailable, only with Groq. Seems to be fixed when adding my own key, which defeats the whole purpose.
The reasoning tokens count is always 0. Their API returns the correct count, so it’s not being correctly extracted by the AI Gateway.
The reasoning_effort option isn’t being properly passed down to Groq’s API. I noticed that I always had a ton of reasoning when setting it to low. Tried the exact same request, but directly using their API, and got the expected amount. So I tried once again, but this time not configured at all, letting it default, and I got a 1:1 of what I get when using the AI Gateway.
Sorry @jerilynzheng for keeping on posting new issues here, but all of these are currently happening, and this is the most direct way I have.
Because Groq kept failing, I added GPT-4.1-Mini as a fallback, even though I keep sending the same prompts. Nothing caches at all on OpenAI, while that happens on Groq.
Hi! Groq is a overloaded provider, so if you want to use AI Gateway, it’s better to not restrict to any specific provider in case they do have capacity issues for a specific model. We have fallbacks and multiple providers for this exact reason. What are you using to try out these models - AI SDK or something else? I can help more on the reasoning side if we get more details there.
On the caching side, there are providers that require a minimum prompt size to trigger caching - I believe this is the case with OpenAI.
Hey, about the reasoning_effort. When you take the exact same body and request a completion, you get 2 very different amount of reasoning. Using Groq’s API, it returns the expected amount when set to low, about 15 tokens. Using the AI Gateway, exact same body, you get like 500 tokens and the reasoning text is the same as if you didn’t set an effort.
With the AI Gateway, at least on Groq and with the GPT OSS models, setting reasoning_effort does not change anything at all.