Hi, I’ve been implementing real-time streaming with Vercel AI SDK’s streamObject and noticed that despite iterating over partialObjectStream, all chunks arrive in a burst only after the model finishes generating the complete response.
const result = await ai.chat.completions.streamObject({
model: MODELS.chat.claude,
schema: z.object({
content: z.string(),
confidence: z.number().min(0).max(1)
}),
messages,
temperature: 0.05,
});
for await (const partialObject of result.partialObjectStream) {
// Expected: gradual streaming
// Reality: all chunks arrive at once after ~8 seconds
}
When I added timing instrumentation I found:
- Time to first chunk was 7-8 seconds
- Total stream duration was 100ms after first chunk
- 200+ chunks delivered in rapid succession
Essentially, the model appears to generate the entire JSON response before the SDK starts parsing and emitting partial objects. This defeats the purpose of streaming for user experience.
I am wondering whether this is expected behavior when using structured schemas with one large string field? Has anyone achieved true incremental streaming with streamObject where partial content arrives progressively? I am considering just switching to streamText and handling the structure myself or somehow breaking the schema down further or using output: ‘no-schema’ mode and parse manually. Any solutions?
My environment is using AI SDK v5, Claude Sonnet 4.5, Node.js backend with SSE to frontend.