I’m building a chat application using:
-
Next.js 15 (App Router)
-
AI SDK v5 (
aipackage) -
OpenAI GPT models via
@ai-sdk/openai -
streamTextfor streaming responses -
useChathook on the frontend withDefaultChatTransport
The Problem
When conversations get very long (150+ messages), we need to:
-
Summarize older messages to fit within context limits
-
Keep the conversation visually intact for the user (they still see all messages)
-
Only send a summary + recent messages to the LLM
What I’ve Tried
I found that prepareStep callback in generateText/streamText is the recommended place to implement this (Discussion #8192).
The pattern would be:
const result = streamText({
model: openai("gpt-4"),
system: systemPrompt,
messages: allMessages,
prepareStep: async ({ messages }) => {
if (messages.length >= 150) {
const summary = await generateSummary(messages.slice(0, -20));
return {
system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
messages: messages.slice(-20), // Keep last 20
};
}
return {};
},
});
The Issue
The problem is that the compaction would re-trigger on every subsequent request because:
-
Frontend sends all messages (for visual consistency)
-
prepareStepsees 150+ messages → compacts -
Next request: frontend still sends all messages → compacts again
-
This repeats forever, wasting API calls on summarization
My Proposed Solution
Track compaction state on the frontend:
// Frontend state
const [compactionState, setCompactionState] = useState({
summary: null, // The generated summary
compactedUpToIndex: 0, // How many messages are summarized
});
// Send to API
body: {
messages: allMessages,
conversationSummary: compactionState.summary,
compactedUpToIndex: compactionState.compactedUpToIndex,
}
Then in the API:
prepareStep: async ({ messages }) => {
// If we already have a summary, use it
if (conversationSummary) {
return {
system: `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}`,
messages: messages.slice(compactedUpToIndex),
};
}
// Check if we need to compact (every 150 messages)
if (messages.length >= 150 && messages.length - compactedUpToIndex >= 150) {
const toSummarize = messages.slice(0, -20);
const summary = await generateSummary(toSummarize);
// Stream back to frontend to store
writer.write({
type: "data",
data: { type: "compaction", summary, compactedUpToIndex: messages.length - 20 }
});
return {
system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
messages: messages.slice(-20),
};
}
return {};
}
Questions
-
Is this the right approach? Or is there a built-in way to handle this in AI SDK v5?
-
Does
prepareStepwork withstreamText? The docs mainly showgenerateTextexamples. -
How do others handle this? Is there a common pattern for:
-
Keeping UI messages intact
-
Only sending compacted context to LLM
-
Avoiding re-compaction on every request
-
-
Alternative: Should the frontend slice messages before sending? Instead of sending all messages and letting
prepareStephandle it, should the frontend only sendmessages.slice(compactedUpToIndex)along with the summary?
Environment
{
"ai": "^5.x",
"@ai-sdk/openai": "^1.x",
"next": "15.5.2"
}
Related
Any guidance would be appreciated! ![]()