[▲ Vercel Community](/) · [Categories](/categories) · [Latest](/latest) · [Top](/top) · [Live](/live)

[Feedback](/c/feedback/8)

# Vercel AI SDK support for Google Gemini 2.0 Flash multimodal API

1 view · 0 likes · 2 posts


Gertie01 (@gertie01) · 2026-04-10

What if the `Vercel AI SDK` version of `google/gemini-2.0-flash` exposes the multimodal API


Swarnava Sengupta (@swarnava) · 2026-04-11

The Vercel AI SDK **does** expose multimodal capabilities for Gemini models. You can send images, audio, and files as input to Gemini models:

```typescript
import { generateText } from "ai"

const result = await generateText({
  model: "google/gemini-2.0-flash",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image", image: imageBuffer }, // or URL
      ],
    },
  ],
})
```

## Multimodal Output (Image Generation)

For models that support image generation (like Gemini 3.1 Flash Image Preview / “Nano Banana 2”), you can use `generateImage`:

```typescript
import { generateImage } from "ai"

const { image } = await generateImage({
  model: "google/gemini-3.1-flash-image-preview",
  prompt: "A futuristic city at sunset",
})
```

## Interleaved Text + Images

For models that generate interleaved text and images, you’d use the streaming response with multimodal parts:

```typescript
import { streamText } from "ai"

const result = streamText({
  model: "google/gemini-3.1-flash-image-preview",
  prompt: "Create a step-by-step recipe with images",
})

for await (const part of result.fullStream) {
  if (part.type === "text-delta") {
    // Handle text
  } else if (part.type === "file") {
    // Handle generated image
  }
}
```

## 

* **Gemini 2.0 Flash**: Multimodal *input* (images, files, audio) ✅
* **Gemini 3.1 Flash Image Preview**: Multimodal *output* (generates images inline) ✅
* The AI SDK abstracts provider differences, so you use the same patterns across models