Hey everyone
I’m building a chat interface using Vercel AI SDK, where users can select from different models (Anthropic, OpenAI, etc.).
I’ve also added multiple custom tools to extend the chat’s functionality — these tools can be invoked by the LLM to perform manual operations based on the user’s prompt.
The issue I’m facing is with the Anthropic (Claude) API:
When a large or complex prompt (with tool schemas included) is sent, I often get a 429 error (rate limit / token limit), and the SDK stops processing abruptly — no partial or fallback response is returned.
I have a few questions around this:
-
Minimizing input tokens:
-
Is there a recommended way to reduce token usage when tool schemas are large?
-
Can I cache or reuse tool definitions/schemas between messages so they don’t have to be re-sent on every request?
-
Does the Vercel AI SDK support any internal caching or compression mechanism for such scenarios?
-
-
Error handling:
-
When Claude returns a 429 or other API errors, I see the errors in the console, but not in my code.
-
How can I capture these API errors programmatically in the SDK so I can show a proper UI error state or retry logic?
-
I’ve tried wrapping the stream call in a try/catch, but it doesn’t seem to give the raw error message that appears in the console.
Any suggestions, patterns, or examples on:
-
Handling Anthropic rate limits / token overflows
-
Reusing tool schemas efficiently
-
Catching API errors at runtime
would be super helpful
Thanks in advance!