Add audio input to Gemini API

omkargarde · December 31, 2025, 2:39pm

https://ai.google.dev/gemini-api/docs/audio#javascript i want to add audio for analysis , basically i give an answer and ai tells me how correct the answer were or something along the lines. i have not been able to find documentation saying to do this

not able to upload audio to gemini

tanstack start, aisdk

pawlean · January 5, 2026, 5:51pm

Hi @omkargarde!

The documentation you linked is for the native Google Generative AI SDK, but since you are using the Vercel AI SDK, the syntax is actually much simpler. You don’t need to manually handle inlineData.

Instead, you pass the audio as a file part within the messages array.

TypeScriptimport { google } from '@ai-sdk/google';
import { generateText } from 'ai';

export async function analyzeAudio(audioFile: File) {
  // Convert the File object to an ArrayBuffer
  const audioBuffer = await audioFile.arrayBuffer();

  const { text } = await generateText({
    model: google('gemini-1.5-flash'), // or 'gemini-1.5-pro'
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Analyze this answer:' },
          {
            type: 'file',
            data: audioBuffer, // The SDK handles the base64 conversion for you
            mediaType: audioFile.type, // IMPORTANT: Use 'mediaType' (not 'mimeType')
          },
        ],
      },
    ],
  });

  return text;
}

FYI Gemini 1.5 Flash is usually better (faster and cheaper) for analyzing short audio answers than 1.5 Pro, unless you need extremely deep pedagogical feedback!

pawlean · January 23, 2026, 6:02pm

Hey there, @omkargarde! Just checking in to see if you’re still looking for help with adding audio input to the Gemini API. Did you find a solution, or do you need more guidance? Excited to help!

Topic		Replies	Views
Free Audio Transcriber on Vercel Showcase nextjs , react , ai	6	609	March 4, 2025
AI Sdk Documentation AI SDK ai-sdk	1	373	August 29, 2024
Vercel AI SDK support for Google Gemini 2.0 Flash multimodal API Feedback ai-sdk	1	49	April 11, 2026
Neosantara - Fast, low latency API access to state-of-the-art models Showcase ai-sdk , ai-gateway , typescript	0	70	January 8, 2026
Prompting V0 to create Gen AI chatbot with user file upload v0 v0	3	191	March 2, 2025

Add audio input to Gemini API

Related topics