Vercel Community

[▲ Vercel Community](/) · [Categories](/categories) · [Latest](/latest) · [Top](/top) · [Live](/live)

[AI SDK](/c/ai-sdk/62)

# Add audio input to Gemini API

78 views · 0 likes · 3 posts


Omkargarde (@omkargarde) · 2025-12-31

https://ai.google.dev/gemini-api/docs/audio#javascript i want to add audio for analysis , basically i give an answer and ai tells me how correct the answer were or something along the lines. i have not been able to find documentation saying to do this

not able to upload audio to gemini

<!-- Code, configuration, and steps that reproduce this issue -->

tanstack start, aisdk


Pauline P. Narvas (@pawlean) · 2026-01-05

Hi @omkargarde!

The documentation you linked is for the **native Google Generative AI SDK**, but since you are using the **Vercel AI SDK**, the syntax is actually much simpler. You don't need to manually handle `inlineData`.

Instead, you pass the audio as a **`file`** part within the `messages` array. 

```
TypeScriptimport { google } from '@ai-sdk/google';
import { generateText } from 'ai';

export async function analyzeAudio(audioFile: File) {
  // Convert the File object to an ArrayBuffer
  const audioBuffer = await audioFile.arrayBuffer();

  const { text } = await generateText({
    model: google('gemini-1.5-flash'), // or 'gemini-1.5-pro'
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Analyze this answer:' },
          {
            type: 'file',
            data: audioBuffer, // The SDK handles the base64 conversion for you
            mediaType: audioFile.type, // IMPORTANT: Use 'mediaType' (not 'mimeType')
          },
        ],
      },
    ],
  });

  return text;
}
```

FYI Gemini 1.5 Flash is usually better (faster and cheaper) for analyzing short audio answers than 1.5 Pro, unless you need extremely deep pedagogical feedback!


Pauline P. Narvas (@pawlean) · 2026-01-23

Hey there, @omkargarde! Just checking in to see if you're still looking for help with adding audio input to the Gemini API. Did you find a solution, or do you need more guidance? Excited to help!