[▲ Vercel Community](/) · [Categories](/categories) · [Latest](/latest) · [Top](/top) · [Live](/live) [Feedback](/c/feedback/8) # Vercel AI SDK support for Google Gemini 2.0 Flash multimodal API 1 view · 0 likes · 2 posts Gertie01 (@gertie01) · 2026-04-10 What if the `Vercel AI SDK` version of `google/gemini-2.0-flash` exposes the multimodal API Swarnava Sengupta (@swarnava) · 2026-04-11 The Vercel AI SDK **does** expose multimodal capabilities for Gemini models. You can send images, audio, and files as input to Gemini models: ```typescript import { generateText } from "ai" const result = await generateText({ model: "google/gemini-2.0-flash", messages: [ { role: "user", content: [ { type: "text", text: "What's in this image?" }, { type: "image", image: imageBuffer }, // or URL ], }, ], }) ``` ## Multimodal Output (Image Generation) For models that support image generation (like Gemini 3.1 Flash Image Preview / “Nano Banana 2”), you can use `generateImage`: ```typescript import { generateImage } from "ai" const { image } = await generateImage({ model: "google/gemini-3.1-flash-image-preview", prompt: "A futuristic city at sunset", }) ``` ## Interleaved Text + Images For models that generate interleaved text and images, you’d use the streaming response with multimodal parts: ```typescript import { streamText } from "ai" const result = streamText({ model: "google/gemini-3.1-flash-image-preview", prompt: "Create a step-by-step recipe with images", }) for await (const part of result.fullStream) { if (part.type === "text-delta") { // Handle text } else if (part.type === "file") { // Handle generated image } } ``` ## * **Gemini 2.0 Flash**: Multimodal *input* (images, files, audio) ✅ * **Gemini 3.1 Flash Image Preview**: Multimodal *output* (generates images inline) ✅ * The AI SDK abstracts provider differences, so you use the same patterns across models