streamObject Not Truly Streaming - All Chunks Arrive Nearly Instantly

Hi, I’ve been implementing real-time streaming with Vercel AI SDK’s streamObject and noticed that despite iterating over partialObjectStream, all chunks arrive in a burst only after the model finishes generating the complete response.

const result = await ai.chat.completions.streamObject({
    model: MODELS.chat.claude,
    schema: z.object({
        content: z.string(),
        confidence: z.number().min(0).max(1)
    }),
    messages,
    temperature: 0.05,
});

for await (const partialObject of result.partialObjectStream) {
    // Expected: gradual streaming
    // Reality: all chunks arrive at once after ~8 seconds
}

When I added timing instrumentation I found:

  • Time to first chunk was 7-8 seconds
  • Total stream duration was 100ms after first chunk
  • 200+ chunks delivered in rapid succession

Essentially, the model appears to generate the entire JSON response before the SDK starts parsing and emitting partial objects. This defeats the purpose of streaming for user experience.

I am wondering whether this is expected behavior when using structured schemas with one large string field? Has anyone achieved true incremental streaming with streamObject where partial content arrives progressively? I am considering just switching to streamText and handling the structure myself or somehow breaking the schema down further or using output: ‘no-schema’ mode and parse manually. Any solutions?

My environment is using AI SDK v5, Claude Sonnet 4.5, Node.js backend with SSE to frontend.

hi sorry for bothering, since you are using claud do you know if ai sdk support anthropic 1 hour prompt caching via openrouter , i tried what the docs saimessages: [

{

  role: 'system',

  content:

    'You are a podcast summary assistant. You are detail-oriented and critical about the content.',

},

{

  role: 'user',

  content: \[

    {

      type: 'text',

      text: 'Given the text body below:',

    },

    {

      type: 'text',

      text: \`<LARGE BODY OF TEXT>\`,

      providerOptions: {

        openrouter: {

          cacheControl: { type: 'ephemeral' ,ttl:"1h"},

        },

      },

    },

    {

      type: 'text',

      text: 'List the speakers?',

    },

  \],

},

],

it works, but the cache write cost 1.25×input not 2×input i thinks this mean that this the 5min cache not the 1hour cache