How to implement conversation compaction with AI SDK v5

I’m building a chat application using:

  • Next.js 15 (App Router)

  • AI SDK v5 (ai package)

  • OpenAI GPT models via @ai-sdk/openai

  • streamText for streaming responses

  • useChat hook on the frontend with DefaultChatTransport

The Problem

When conversations get very long (150+ messages), we need to:

  1. Summarize older messages to fit within context limits

  2. Keep the conversation visually intact for the user (they still see all messages)

  3. Only send a summary + recent messages to the LLM

What I’ve Tried

I found that prepareStep callback in generateText/streamText is the recommended place to implement this (Discussion #8192).

The pattern would be:

const result = streamText({
  model: openai("gpt-4"),
  system: systemPrompt,
  messages: allMessages,
  prepareStep: async ({ messages }) => {
    if (messages.length >= 150) {
      const summary = await generateSummary(messages.slice(0, -20));
      return {
        system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
        messages: messages.slice(-20), // Keep last 20
      };
    }
    return {};
  },
});

The Issue

The problem is that the compaction would re-trigger on every subsequent request because:

  1. Frontend sends all messages (for visual consistency)

  2. prepareStep sees 150+ messages → compacts

  3. Next request: frontend still sends all messages → compacts again

  4. This repeats forever, wasting API calls on summarization

My Proposed Solution

Track compaction state on the frontend:

// Frontend state
const [compactionState, setCompactionState] = useState({
  summary: null,           // The generated summary
  compactedUpToIndex: 0,   // How many messages are summarized
});

// Send to API
body: {
  messages: allMessages,
  conversationSummary: compactionState.summary,
  compactedUpToIndex: compactionState.compactedUpToIndex,
}

Then in the API:

prepareStep: async ({ messages }) => {
  // If we already have a summary, use it
  if (conversationSummary) {
    return {
      system: `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}`,
      messages: messages.slice(compactedUpToIndex),
    };
  }

  // Check if we need to compact (every 150 messages)
  if (messages.length >= 150 && messages.length - compactedUpToIndex >= 150) {
    const toSummarize = messages.slice(0, -20);
    const summary = await generateSummary(toSummarize);

    // Stream back to frontend to store
    writer.write({
      type: "data",
      data: { type: "compaction", summary, compactedUpToIndex: messages.length - 20 }
    });

    return {
      system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`,
      messages: messages.slice(-20),
    };
  }

  return {};
}

Questions

  1. Is this the right approach? Or is there a built-in way to handle this in AI SDK v5?

  2. Does prepareStep work with streamText? The docs mainly show generateText examples.

  3. How do others handle this? Is there a common pattern for:

    • Keeping UI messages intact

    • Only sending compacted context to LLM

    • Avoiding re-compaction on every request

  4. Alternative: Should the frontend slice messages before sending? Instead of sending all messages and letting prepareStep handle it, should the frontend only send messages.slice(compactedUpToIndex) along with the summary?

Environment

{
  "ai": "^5.x",
  "@ai-sdk/openai": "^1.x",
  "next": "15.5.2"
}

Related

Any guidance would be appreciated! :folded_hands:

1 Like

Your approach is solid! You’re on the right track with using prepareStep and tracking compaction state on the frontend. Let me address your questions:

1. Is this the right approach?

Yes, this is a good approach. There’s no built-in conversation compaction in AI SDK v5, so managing it yourself is the way to go. Your solution of tracking compaction state on the frontend is smart.

2. Does prepareStep work with streamText?

Yes, prepareStep works with both generateText and streamText. Here’s how it looks with streamText:

// app/api/chat/route.ts import { streamText } from 'ai' import { openai } from '@ai-sdk/openai' export async function POST(req: Request) { const { messages, conversationSummary, compactedUpToIndex } = await req.json() const result = streamText({ model: openai('gpt-4'), messages, prepareStep: async ({ messages }) => { // Your compaction logic here if (conversationSummary) { return { system: `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}`, messages: messages.slice(compactedUpToIndex), } } if (messages.length >= 150) { // Generate summary and handle compaction const summary = await generateSummary(messages.slice(0, -20)) return { system: `${systemPrompt}\n\n[SUMMARY]\n${summary}`, messages: messages.slice(-20), } } return {} }, }) return result.toDataStreamResponse() }

3. Alternative approach - Frontend slicing

Actually, I’d recommend a simpler approach: let the frontend handle the slicing. This avoids the complexity of streaming compaction data back:

// Frontend component 'use client' const [compactionState, setCompactionState] = useState({ summary: null, compactedUpToIndex: 0, }) const { messages, input, handleInputChange, handleSubmit } = useChat({ api: '/api/chat', body: { conversationSummary: compactionState.summary, compactedUpToIndex: compactionState.compactedUpToIndex, }, onFinish: async (message) => { // Check if we need to compact after this response if (messages.length >= 150 && !compactionState.summary) { const summaryResponse = await fetch('/api/summarize', { method: 'POST', body: JSON.stringify({ messages: messages.slice(0, -20) }), }) const { summary } = await summaryResponse.json() setCompactionState({ summary, compactedUpToIndex: messages.length - 20, }) } }, }) // Send only relevant messages to the API const messagesToSend = compactionState.summary ? messages.slice(compactionState.compactedUpToIndex) : messages

Then your API route becomes much simpler:

// app/api/chat/route.ts export async function POST(req: Request) { const { messages, conversationSummary } = await req.json() const systemMessage = conversationSummary ? `${systemPrompt}\n\n[SUMMARY]\n${conversationSummary}` : systemPrompt const result = streamText({ model: openai('gpt-4'), system: systemMessage, messages, }) return result.toDataStreamResponse() }

4. Common patterns

The pattern you’re using is solid. Many developers handle this by:

  • Keeping full conversation history in UI state
  • Tracking compaction metadata separately
  • Only sending relevant context to the LLM
  • Periodically summarizing older messages

Your approach avoids re-compaction and maintains UI consistency, which is exactly what you want.

1 Like

Hey @noe-4414! I just wanted to check in and see if you’re still working on the conversation compaction issue. Have you found a solution, or do you still need some help? Excited to help!

Hello Pauline ! Thanks for the followup ! I have implemented the compaction based on what you said :slight_smile: I would be great to have this in native if you can push it up haha. Our project is chatify.fr (soon to be renamed miria AI), we have 4000 users and do 120K ARR and funded by Antler since 4 days. We lose the AI sdk. However I want to implement an agentic chat system similar to Manus or Claude and going through to the sdk docs.

Best regards

1 Like