Hey Vercel community! ![]()
Iβm excited to share Seq, an open-source AI video production studio I built as a template in v0. This post covers the architecture, challenges, and lessons learned.
What is Seq?
Seq is the first AI-native NLE (Non-Linear Editor) built for the browser. The workflow is:
-
Text β Storyboard: Describe scenes in natural language, Gemini 3 Pro Image generates visual panels
-
Storyboard β Video: Animate panels with Veo 3.1, WAN 2.2, or WAN 2.5
-
Video β Timeline: Edit clips on a professional multi-track timeline
-
Timeline β Export: Render to 720p or 1080p MP4 via FFmpeg WASM
Live Demo: https://seq-studio.vercel.app
GitHub: GitHub - headline-design/seq: AI-Native Video Production Studio
v0 Template: https://v0.app/templates/seq-ZAICjqmFe5w
Technical Architecture
The Stack
-
Framework: Next.js 16 with App Router
-
React: React 19
-
Styling: Tailwind CSS 4
-
AI Integration: Vercel AI SDK + AI Gateway
-
Image Generation: Gemini 3 Pro Image
-
Video Generation: Veo 3.1, WAN 2.2, WAN 2.5 (via fal.ai)
-
Export: FFmpeg WASM for client-side encoding
-
Storage: Vercel Blob
-
Video Compositing: Canvas API with OffscreenCanvas
-
Audio: Web Audio API with AudioContext for real-time mixing
-
State Management: React hooks + refs for frame-accurate playback
Key Challenges
1. Frame-Accurate Playback
Browser video playback isnβt designed for frame-accurate scrubbing. I solved this by:
-
Using
requestVideoFrameCallbackfor precise timing -
Maintaining separate video elements for A/B playback switching
-
Building a custom playhead that drives both video elements
2. Real-Time Audio Graph
Mixing multiple audio tracks in real-time required building an audio graph with Web Audio API:
-
Each track has its own
GainNodefor volume control -
Tracks connect to a
MediaStreamDestinationfor export -
Mute/solo states are handled by manipulating gain values
3. Multi-Model AI Integration
Different shots need different models:
-
Veo 3.1 Fast: Quick iterations during ideation
-
Veo 3.1 Standard: Production-quality output
-
WAN 2.2: Frame-to-frame transitions with first/last frame support
-
WAN 2.5: Higher resolution native output (1080p)
The Panel Processor automatically suggests the optimal model based on shot type and duration.
4. Export Pipeline
Rendering a timeline to video in the browser uses FFmpeg WASM:
-
Clips are processed and encoded client-side
-
Audio tracks are mixed via Web Audio API
-
Progress is tracked through phases: init, audio, video, encoding
-
Output options: 720p (fast, can reuse preview) or 1080p (full render)
Why v0?
Building Seq with v0 was a deliberate choice. I wanted to:
-
Prove the platform: Show that v0 can handle complex, stateful applications
-
Rapid iteration: The AI-assisted coding let me prototype features in hours, not days
-
Community: Make the template available for others to learn from and build upon
v0βs ability to understand context across multiple files was crucial for a project this size. The codebase has 100+ files and the AI could reason about component relationships, state flow, and side effects.
Lessons Learned
-
AI-native β AI-assisted: Building AI-first changes everything about UX design
-
Browser APIs are powerful: Canvas + Web Audio + FFmpeg WASM = full video editor
-
State management is king: Frame-accurate playback requires obsessive attention to state
-
v0 scales: Complex projects are absolutely buildable with AI assistance
Whatβs Next
-
Additional audio track features
-
Keyframe-based animation curves
-
Collaborative editing with real-time sync
-
Plugin architecture for custom effects
Try It
Live Demo: https://seq-studio.vercel.app
GitHub: GitHub - headline-design/seq: AI-Native Video Production Studio
v0 Template: https://v0.app/templates/seq-ZAICjqmFe5w
Would love to hear your feedback! Drop a comment below or open an issue on GitHub.
