Text → Video
- Duration: 5 / 10 seconds
- Resolution: 720p / 1080p
- Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Optional audio URL (WAV/MP3, 3–30s, <=15MB)
Wan 2.6 unifies Text, Image, and Reference video inputs in one engine card, so you can iterate on short cinematic beats without switching models.
Use multi-shot prompts for mini trailers, animate a single still, or lock character consistency using 1–3 reference videos.
Wan 2.6 – Multi-shot AI video (Text/Image 5–15s, Reference 5–10s, 720p/1080p)
Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette. Shot 1…
View render →Why Wan 2.6 is powerful inside MaxVideoAI
Best use cases
Choose Text, Image, or Reference mode, set duration and resolution, then prompt.
Reference mode keeps subject identity consistent by grounding the model in real video footage.
Text → Video (T2V) — fastest way to test an idea
Image → Video (I2V) — perfect to animate a key visual
Reference → Video (R2V) — lock identity / consistency
In-app flow
Specs reflect the live Fal routing today.
Wan 2.6 examples will appear here soon.
Use a global look line, then timestamped shots for pacing.
Shot 1 [0-3s] ... Shot 2 [3-8s] ... Shot 3 [8-15s] ... lighting, camera, motion.
For R2V, place @Video1/@Video2/@Video3 in the prompt to anchor subjects.
Ready-to-render example
Wan 2.6 demo clip from MaxVideoAI
Ready-to-render example (multi-shot, 15s, 16:9, 1080p)
Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette.
Shot 1 [0–4s]: slow push-in through a wet alley, reflections on the ground, a silhouette in the distance, hazy atmosphere.
Shot 2 [4–10s]: cut to close-up of hands opening a sealed envelope on a wooden table, warm side light, very shallow focus.
Shot 3 [10–15s]: a door opens into an overexposed white room, burst of light, fine dust in the beam, fade to black.
Common issues (quick fixes)
Audio URLs are optional for Text and Image modes. Reference mode does not support audio uploads.
1–3 MP4/MOV references. Tag them in the prompt as @Video1, @Video2, and @Video3.
Text and Image modes: 5, 10, or 15 seconds. Reference mode: 5 or 10 seconds.
Compare price, latency and output options across the MaxVideoAI catalog.
wan
Generate Wan 2.5 preview clips from prompts or single reference stills, complete with optional audio and 480p–1080p tiers.
Compare Wan 2.6 vs Wan 2.5 →openai
Create rich AI-generated videos from text or image prompts using Sora 2. Native voice-over, ambient effects, and motion sync via MaxVideoAI.
Compare Wan 2.6 vs Sora 2 →openai
Create longer, more immersive AI videos from text or images using Sora 2 Pro. Native voice, ambient sound, prompt chaining, and advanced control via MaxVideoAI.
Compare Wan 2.6 vs Sora 2 Pro →Wan 2.6 is built for short cinematic sequences with multi-shot control.
Start with text or image, then move to reference video for consistency.
Open Wan 2.6 in Generate →