AI streaming works locally, but is being cut off in Vercel

glaksmono · September 9, 2025, 1:11am

I find this VERY strange behavior where the streaming for AI works locally, but is being "cut off” in Vercel (production)

I have tried to use both edge runtime and nodejs runtime, both have the same issues
I don’t see any errors in the Vercel logs
I remember that this has happened on the past because I forgot to specify the runtime in my route.ts, which then I solved it by specifying edge runtime
- But all sudden this issue re-appeared out of the blue, when I didn’t have any change in the code, nor any deployment since 2 July 2025
- Getting a sense that there’s some change in Vercel platform since that date, which caused this issue

Here’s some more contexts:

I’m not using AI SDK, I’m using Langchain
I have enabled the Fluid Compute
Here’s my vercel.json

{
  "functions": {
    "app/api/chat/agentic/route.ts": {
      "maxDuration": 300
    },
    "app/api/chat/route.ts": {
      "maxDuration": 60
    }
  }
}

Here’s some part of my route.ts where I specify the runtime:

export const runtime = 'nodejs'
export const maxDuration = 300 // 5 minutes timeout for agent processing

Here’s my AgentExecutor:

    return new AgentExecutor({
        agent,
        tools,
        returnIntermediateSteps: true,
        verbose: false,
        maxIterations: 15, // Reduced from 500 to prevent timeout issues
        handleParsingErrors: true,
        earlyStoppingMethod: 'generate', // Stop when the agent generates a final answer
    });

Anyone have experienced similar issue? Any advice on how to solve this? I’m super frustrated, I already asked Vercel Support and they’re useless (I’m on Vercel Pro)

jacobparis · September 9, 2025, 4:51am

I can confirm your functions are not timing out and aren’t seeing any errors from the AI APIs you call (including Langchain)

If you check your Logs or Observability tabs, you can filter to the /api/chat/agentic route to see requests and it will show both the runtime and timeout like 3.5s / 5m to help confirm your settings are working

I’m checking internally to see if there’s any potential changes that could have affected this for you from our end, if you can narrow down a smaller window for when the issue started happening it would help me dig deeper in the right place

Unfortunately from Vercel’s end at the moment this looks like a simple request that ends successfully. Since LangChain is the last step before returning the stream, I’d double check Langchain is returning the full token count and try to get some logs between what langchain returns and what you return from your function. If you can prove your API is returning data that gets truncated before it hits the browser then that would strongly point toward an issue on our end, but currently it’s looking more like the AgentExecutor is giving the wrong output

glaksmono · September 9, 2025, 5:19am

The problem is that it works in localhost:3000 though, which makes it VERY difficult to debug

glaksmono · September 9, 2025, 5:54am

Here’s to add to the response results between the localhost and the Vercel (production) for the same exact prompt

jacobparis · September 9, 2025, 4:34pm

From a technical point of view, LangChain’s AgentExecutor returns a generator that can be consumed as a ReadableStream. Your API route returns this generator so the error is in one of two places

Your API route is returning a stream, which is the whole stream locally but in prod Vercel cuts it off after two lines
or, The AgentExecutor works correctly locally, but in prod it only generates two lines, which then get correctly returned to the browser

Since there’s no timeout or errors I suspect option 2 is the issue.

const executor = new AgentExecutor({
  agent,
  tools,
  returnIntermediateSteps: true,
  verbose: false,
  maxIterations: 15,
  handleParsingErrors: true,
  earlyStoppingMethod: "generate",
});

const stream = await executor.stream({ input });

for await (const chunk of stream) {
  console.log("chunk:", chunk);
}

return stream;

You should be able to log the output serverside like this, which you can verify in your prod environment

If you see the whole stream appearing in the logs but only two lines make it to the browser, then that would strongly point to an issue on our end
If you only see the two lines here, then that points to a LangChain issue. In that case I’d check their dashboard for any logs/hints, and add error handler callbacks to your AgentExecutor instance to see what’s truncating the output

If you don’t want to push those logs to prod, you can make a new branch and try it in a Preview environment. If for some reason it works in Preview on Vercel but NOT prod, then it’s usually an issue with either environment variables or devDependencies, but you can debug further in that direction once it’s narrowed down

glaksmono · September 10, 2025, 3:15am

I added the log in the production. Here you go:

It doesn’t seem to even show the stream chunks in the Vercel server. I do see these chunks on my localhost logs.
I noticed that in LangSmith, it should be returning the stream (see below)

You can also see it yourself here: LangSmith

The thing is that these stream worked fine in localhost, but it doesn’t work in Vercel. So I’m puzzled here

Any ideas?

Topic		Replies	Views
AI SDK data stream protocol response getting cut off Help nextjs , ai-sdk	2	377	October 1, 2024
Python AI Response Streaming Help python	2	142	April 26, 2025
My project works locally but fails to run on Vercel Help	4	101	December 28, 2024
How to avoid timeout concerns on agentic deep research Help nextjs , python	2	149	March 9, 2025
Next.js API routes returns 504 (Gateway Timeout) while working locally Help nextjs	7	510	July 27, 2025

AI streaming works locally, but is being cut off in Vercel

Related topics