How to implement conversation compaction with AI SDK v5

noe-4414 · December 4, 2025, 11:13am

I’m building a chat application using:

Next.js 15 (App Router)
AI SDK v5 (ai package)
OpenAI GPT models via @ai-sdk/openai
streamText for streaming responses
useChat hook on the frontend with DefaultChatTransport

The Problem

When conversations get very long (150+ messages), we need to:

Summarize older messages to fit within context limits
Keep the conversation visually intact for the user (they still see all messages)
Only send a summary + recent messages to the LLM

What I’ve Tried

I found that prepareStep callback in generateText/streamText is the recommended place to implement this (Discussion #8192).

The pattern would be:

const result = streamText({
  model: openai("gpt-4"),
  system: systemPrompt,
  messages: allMessages,
  prepareStep: async ({ messages }) => {
    if (messages.length >= 150) {
      const summary = await generateSummary(messages.slice(0, -20));
      return {
        system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
        messages: messages.slice(-20), // Keep last 20
      };
    }
    return {};
  },
});

The Issue

The problem is that the compaction would re-trigger on every subsequent request because:

Frontend sends all messages (for visual consistency)
prepareStep sees 150+ messages → compacts
Next request: frontend still sends all messages → compacts again
This repeats forever, wasting API calls on summarization

My Proposed Solution

Track compaction state on the frontend:

// Frontend state
const [compactionState, setCompactionState] = useState({
  summary: null,           // The generated summary
  compactedUpToIndex: 0,   // How many messages are summarized
});

// Send to API
body: {
  messages: allMessages,
  conversationSummary: compactionState.summary,
  compactedUpToIndex: compactionState.compactedUpToIndex,
}

Then in the API:

prepareStep: async ({ messages }) => {
  // If we already have a summary, use it
  if (conversationSummary) {
    return {
      system: `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}`,
      messages: messages.slice(compactedUpToIndex),
    };
  }

  // Check if we need to compact (every 150 messages)
  if (messages.length >= 150 && messages.length - compactedUpToIndex >= 150) {
    const toSummarize = messages.slice(0, -20);
    const summary = await generateSummary(toSummarize);

    // Stream back to frontend to store
    writer.write({
      type: "data",
      data: { type: "compaction", summary, compactedUpToIndex: messages.length - 20 }
    });

    return {
      system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
      messages: messages.slice(-20),
    };
  }

  return {};
}

Questions

Is this the right approach? Or is there a built-in way to handle this in AI SDK v5?
Does prepareStep work with streamText? The docs mainly show generateText examples.
How do others handle this? Is there a common pattern for:
- Keeping UI messages intact
- Only sending compacted context to LLM
- Avoiding re-compaction on every request
Alternative: Should the frontend slice messages before sending? Instead of sending all messages and letting prepareStep handle it, should the frontend only send messages.slice(compactedUpToIndex) along with the summary?

Environment

{
  "ai": "^5.x",
  "@ai-sdk/openai": "^1.x",
  "next": "15.5.2"
}

// app/api/chat/route.ts import { streamText } from 'ai' import { openai } from '@ai-sdk/openai' export async function POST(req: Request) { const { messages, conversationSummary, compactedUpToIndex } = await req.json() const result = streamText({ model: openai('gpt-4'), messages, prepareStep: async ({ messages }) => { // Your compaction logic here if (conversationSummary) { return { system: `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}`, messages: messages.slice(compactedUpToIndex), } } if (messages.length >= 150) { // Generate summary and handle compaction const summary = await generateSummary(messages.slice(0, -20)) return { system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`, messages: messages.slice(-20), } } return {} }, }) return result.toDataStreamResponse() }

3. Alternative approach - Frontend slicing

Actually, I’d recommend a simpler approach: let the frontend handle the slicing. This avoids the complexity of streaming compaction data back:

// Frontend component 'use client' const [compactionState, setCompactionState] = useState({ summary: null, compactedUpToIndex: 0, }) const { messages, input, handleInputChange, handleSubmit } = useChat({ api: '/api/chat', body: { conversationSummary: compactionState.summary, compactedUpToIndex: compactionState.compactedUpToIndex, }, onFinish: async (message) => { // Check if we need to compact after this response if (messages.length >= 150 && !compactionState.summary) { const summaryResponse = await fetch('/api/summarize', { method: 'POST', body: JSON.stringify({ messages: messages.slice(0, -20) }), }) const { summary } = await summaryResponse.json() setCompactionState({ summary, compactedUpToIndex: messages.length - 20, }) } }, }) // Send only relevant messages to the API const messagesToSend = compactionState.summary ? messages.slice(compactionState.compactedUpToIndex) : messages

Then your API route becomes much simpler:

// app/api/chat/route.ts export async function POST(req: Request) { const { messages, conversationSummary } = await req.json() const systemMessage = conversationSummary ? `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}` : systemPrompt const result = streamText({ model: openai('gpt-4'), system: systemMessage, messages, }) return result.toDataStreamResponse() }

4. Common patterns

The pattern you’re using is solid. Many developers handle this by:

Keeping full conversation history in UI state
Tracking compaction metadata separately
Only sending relevant context to the LLM
Periodically summarizing older messages

Your approach avoids re-compaction and maintains UI consistency, which is exactly what you want.

pawlean · January 23, 2026, 12:01pm

Hey @noe-4414! I just wanted to check in and see if you’re still working on the conversation compaction issue. Have you found a solution, or do you still need some help? Excited to help!

noe-4414 · February 5, 2026, 5:39pm

Hello Pauline ! Thanks for the followup ! I have implemented the compaction based on what you said I would be great to have this in native if you can push it up haha. Our project is chatify.fr (soon to be renamed miria AI), we have 4000 users and do 120K ARR and funded by Antler since 4 days. We lose the AI sdk. However I want to implement an agentic chat system similar to Manus or Claude and going through to the sdk docs.

Best regards

Topic		Replies	Views
AI SDK v5 Streaming Works Server-Side But Not in UI - OpenRouter AI SDK nextjs	4	820	November 7, 2025
Tool Execution Super Unreliable After ~5 Messages in Conversation AI SDK nextjs	0	293	November 16, 2025
Possible to use AI SDK's useChat hook without an API route? Feedback ai-sdk	10	1994	March 28, 2025
Interactive AI Engineer Portfolio Showcase nextjs	5	170	June 27, 2025
Vertex claude api can not stream output by using ai-chatbot template Help	8	426	January 2, 2025

How to implement conversation compaction with AI SDK v5

The Problem

What I’ve Tried

The Issue

My Proposed Solution

Questions

Environment

Related

1. Is this the right approach?

2. Does `prepareStep` work with `streamText`?

3. Alternative approach - Frontend slicing

4. Common patterns

How to implement conversation compaction with AI SDK v5

The Problem

What I’ve Tried

The Issue

My Proposed Solution

Questions

Environment

Related

1. Is this the right approach?

2. Does prepareStep work with streamText?

3. Alternative approach - Frontend slicing

4. Common patterns

Related topics

2. Does `prepareStep` work with `streamText`?