Add audio input to Gemini API

https://ai.google.dev/gemini-api/docs/audio#javascript i want to add audio for analysis , basically i give an answer and ai tells me how correct the answer were or something along the lines. i have not been able to find documentation saying to do this

not able to upload audio to gemini

tanstack start, aisdk

Hi @omkargarde!

The documentation you linked is for the native Google Generative AI SDK, but since you are using the Vercel AI SDK, the syntax is actually much simpler. You don’t need to manually handle inlineData.

Instead, you pass the audio as a file part within the messages array.

TypeScriptimport { google } from '@ai-sdk/google';
import { generateText } from 'ai';

export async function analyzeAudio(audioFile: File) {
  // Convert the File object to an ArrayBuffer
  const audioBuffer = await audioFile.arrayBuffer();

  const { text } = await generateText({
    model: google('gemini-1.5-flash'), // or 'gemini-1.5-pro'
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Analyze this answer:' },
          {
            type: 'file',
            data: audioBuffer, // The SDK handles the base64 conversion for you
            mediaType: audioFile.type, // IMPORTANT: Use 'mediaType' (not 'mimeType')
          },
        ],
      },
    ],
  });

  return text;
}

FYI Gemini 1.5 Flash is usually better (faster and cheaper) for analyzing short audio answers than 1.5 Pro, unless you need extremely deep pedagogical feedback!

Hey there, @omkargarde! Just checking in to see if you’re still looking for help with adding audio input to the Gemini API. Did you find a solution, or do you need more guidance? Excited to help!