I’m using the AI SDK, and have a bunch of messages of type ModelMessage. I want to have a rough estimate of the token usage for these messages (which contain images, tool calls/results etc..).
I’m sending them to Anthropic, but that doesn’t matter as again: I want a rough token usage estimate.
I saw that Anthropic has a method to count tokens, but the messages need to have a specific type, which is different from ModelMessage.
So what’s the best way to estimate the token count from a list of ModelMessage[]
There is an in-progress PR for converting to AnthropicMessages
On the whole I think we’re pretty far away from a robust solution for this problem though
For calculating the input tokens of text-only prompts, you can pass the text to the tokenizer for your given provider. Anthropic and OpenAI tokenize text differently but as an estimate I think you’ll be pretty close either way. This needs to include the system prompt, and due to prompt caching you can consider it a pessimistic estimate especially if you often send prompts with similar beginnings
Images are a different story as they aren’t tokenized like text. Some providers do flat rates per image, while IIRC claude charges a base rate times the megapixels for your image. That may lead you to think you can estimate the image input costs by multiplying the dimensions, but that will also be a pessimistic estimate as Anthropic resizes images before processing.
Output tokens are pretty much impossible to predict deterministically, especially when tool calls are allowed. The model may call the same tool multiple times or not at all. Some models output a lot of reasoning tokens before they reply and others don’t.
The best way to predict token usage for an app is then statistical methods based on a dataset you compile of past inputs and the reported output tokens that were spent, once you have a history
Or you forgo tool calls, limit steps, and explicitly use single prompt generations where you can base your estimate on the max output tokens possible
When you say we’re far away from the solution: wouldn’t the solution simply be to export your convertToAnthropicMessagePrompt? You already have that function in your code, just gotta make it available to the public & update docs.
Then, we can just use that function to convert AI SDK messages & use e.g. Anthropic’s token counter.
In the meantime, I’ll probably just use a text tokenizer, and for images & i’ll just use a rough estimate of 1,600 tokens.
Yeah that will let you use Anthropic’s input estimation, but still no reliable way to estimate output tokens and you’ll need a different solution if you use any other model provider
Hopefully one day AI Gateway can have this built in and then users don’t need to worry about it