Can't disable reasoning for zai/glm-5.1

I’m experiencing two related issues with zai/glm-5.1 through AI Gateway:

  1. Reasoning cannot be disabled. The documented controls (reasoning: { "effort": "none" }, reasoning: { "enabled": false }) are accepted but not honored by 5 of the 6 providers serving this model. Only baseten actually disables thinking.

Context: we’re benchmarking GLM-5.1 for a latency-critical voice agent, so ~100–250 hidden reasoning tokens before the first content token (2.5–10s of added TTFT/TTFAT) makes the model unusable.

Issue 1: reasoning disable not forwarded/honored

provider reasoning option total out_tokens reasoning_tokens
baseten {"enabled": false} 449 ms 1 0
baseten {"effort": "none"} 423 ms 1 0
togetherai {"enabled": false} 3995 ms 176 158
togetherai {"effort": "none"} 5711 ms 102 85
fireworks {"enabled": false} 4444 ms 182 180
fireworks {"effort": "none"} 2841 ms 189 178
deepinfra {"enabled": false} 3863 ms 274 265
deepinfra {"effort": "none"} 2552 ms 124 108
novita {"enabled": false} 6290 ms 156 153
novita {"effort": "none"} 9876 ms 188 185
zai {"enabled": false} 4951 ms 159 156
zai {"effort": "none"} 4731 ms 164 161

Notes:

  • Per the advanced configuration docs, effort: "none" “disables reasoning”. It doesn’t for this model on 5/6 providers.
  • {"enabled": false} is worse than a no-op: it hides the reasoning text from the stream but the tokens are still generated and billed (reasoning_chars=0 while reasoning_tokens=150+). That’s easy to misread as “thinking disabled” while you still pay the full latency and cost.

Hi,
While not GLM per se, I find it that setting it through providerOptions works better for most of the models that I used.
You can find the reasoning config for each model from their official documentation (For GLM: Thinking Mode - Overview - Z.AI DEVELOPER DOCUMENT ) and perhaps need to check if different providers have different config (e.g. Bedrock).

"providerOptions": {
    "gateway": {
        "only": ["zai", "novita"],
        "sort": "ttft"
    },
    "zai": {"thinking": {"type": "disabled"}},
    "novita": {"thinking": {"type": "disabled"}}
}

Though it does that mean you need to set them for each providers that you use.

Hi Hugopod,

Your table makes this look less like a client-side syntax issue and more like provider-specific behavior behind the Gateway.

One thing I’d do for the benchmark is pin the provider instead of letting Gateway route across all providers, since you already found that Baseten is the only one honoring the disable option:

providerOptions: {
  gateway: {
    only: ["baseten"]
  },
  baseten: {
    thinking: { type: "disabled" }
  }
}

Then run the same prompt a few times and compare:

- first token latency
- total output tokens
- reasoning_tokens
- provider metadata / selected provider

If Baseten consistently returns reasoning_tokens: 0 and the others do not, then I’d avoid fallback routing for this latency-critical path unless each fallback provider has its own verified native “thinking disabled” option.

I’d also be careful with reasoning: { effort: "none" } as a portable assumption here. In the AI SDK docs, none is only supported for specific OpenAI GPT-5.1 models, so for GLM via multiple providers I’d expect provider-specific options to be more reliable than a single normalized field.

For this specific case, the strongest support report would be exactly what you already collected: same model, same prompt, same Gateway request shape, pinned provider, and reasoning_tokens still generated despite a disable option.