Vercel AI SDK support for Google Gemini 2.0 Flash multimodal API

What if the Vercel AI SDK version of google/gemini-2.0-flash exposes the multimodal API

The Vercel AI SDK does expose multimodal capabilities for Gemini models. You can send images, audio, and files as input to Gemini models:

import { generateText } from "ai"

const result = await generateText({
  model: "google/gemini-2.0-flash",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        { type: "image", image: imageBuffer }, // or URL
      ],
    },
  ],
})

Multimodal Output (Image Generation)

For models that support image generation (like Gemini 3.1 Flash Image Preview / “Nano Banana 2”), you can use generateImage:

import { generateImage } from "ai"

const { image } = await generateImage({
  model: "google/gemini-3.1-flash-image-preview",
  prompt: "A futuristic city at sunset",
})

Interleaved Text + Images

For models that generate interleaved text and images, you’d use the streaming response with multimodal parts:

import { streamText } from "ai"

const result = streamText({
  model: "google/gemini-3.1-flash-image-preview",
  prompt: "Create a step-by-step recipe with images",
})

for await (const part of result.fullStream) {
  if (part.type === "text-delta") {
    // Handle text
  } else if (part.type === "file") {
    // Handle generated image
  }
}

  • Gemini 2.0 Flash: Multimodal input (images, files, audio) :white_check_mark:
  • Gemini 3.1 Flash Image Preview: Multimodal output (generates images inline) :white_check_mark:
  • The AI SDK abstracts provider differences, so you use the same patterns across models