← Back to models

Wan 2.6 – Multi-shot AI video (Text/Image 5–15s, Reference 5–10s, 720p/1080p)

Cinematic short clips built for structured prompts, clean transitions, and consistent subjects.

720p/1080pText/Image 5–15sReference 5–10sOptional audio (T2V/I2V)

Wan 2.6 unifies Text, Image, and Reference video inputs in one engine card, so you can iterate on short cinematic beats without switching models.

Use multi-shot prompts for mini trailers, animate a single still, or lock character consistency using 1–3 reference videos.

Audio on15s

Wan 2.6 – Multi-shot AI video (Text/Image 5–15s, Reference 5–10s, 720p/1080p)

Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette. Shot 1…

View render →

Why Wan 2.6 is powerful inside MaxVideoAI

  • Text → Video, Image → Video, and Reference → Video in one engine card
  • Multi-shot prompting with timestamp markers for mini narratives
  • Optional background audio via URL on T2V/I2V (disabled on R2V)
  • Expanded aspect ratios for cross-platform delivery
  • Transparent per-second pricing in the MaxVideoAI wallet

Best use cases

  • Short cinematic storyboards and mini trailers (multi-shot, 10–15s)
  • Product hero motion from a single key visual (I2V)
  • Character or subject consistency with 1–3 reference videos (R2V)
  • Cross-format social deliverables without re-cropping

How Wan 2.6 works in MaxVideoAI

Choose Text, Image, or Reference mode, set duration and resolution, then prompt.

Reference mode keeps subject identity consistent by grounding the model in real video footage.

Quick start (1 minute)

Text → Video (T2V)fastest way to test an idea

  1. Choose 10s, 720p, 16:9 (safe)
  2. Describe: subject + location + lighting + camera style
  3. Add 2–3 timestamped beats if you want multi-shot
  4. If the mood is right → rerun in 1080p

Image → Video (I2V)perfect to animate a key visual

  1. Upload the image (the ratio follows automatically)
  2. Focus on motion (camera + subject) and micro-details (wind, reflections, dust…)
  3. 10–15s = more premium feel (more breathing room)

Reference → Video (R2V)lock identity / consistency

  1. Upload 1–3 videos (varied angles, same subject)
  2. In the prompt, tag @Video1 / @Video2 / @Video3
  3. Ask for a simple shot (not a full film)
  4. Start at 5–10s (10s max) to maximize consistency

In-app flow

  1. 1. Pick Wan 2.6.
  2. 2. Choose Text → Video, Image → Video, or Reference → Video.
  3. 3. Set duration, resolution, and aspect ratio (T2V/R2V).
  4. 4. Optional: seed.
  5. 5. Optional: attach background audio (T2V/I2V only).
  6. 6. See the final price before rendering.

Real Specs – Wan 2.6 in MaxVideoAI

Specs reflect the live Fal routing today.

Text → Video

  • Duration: 5 / 10 seconds
  • Resolution: 720p / 1080p
  • Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
  • Optional audio URL (WAV/MP3, 3–30s, <=15MB)

Image → Video

  • Input: single still image required
  • Duration: 5 / 10 seconds
  • Resolution: 720p / 1080p
  • Aspect ratio follows the source image
  • Optional audio URL (WAV/MP3)

Reference → Video

  • Input: 1–3 reference videos (MP4/MOV)
  • Prompt: tag @Video1/@Video2/@Video3
  • Duration: 5 / 10 / 15 seconds
  • Resolution: 720p / 1080p
  • Audio uploads not supported

Pricing

  • $0.13/s at 720p
  • $0.20/s at 1080p
  • Quick examples
  • 720p: $0.13/s → 5s = $0.65 · 10s = $1.30 · 15s = $1.95
  • 1080p: $0.20/s → 5s = $1.00 · 10s = $2.00 · 15s = $3.00

Wan 2.6 examples will appear here soon.

MaxVideoAI Wan 2.6 Text, Image & Reference to Video example – Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette. Shot 1…

Wan 2.6 Text, Image & Reference to Video · 15s

Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette. Shot 1…

Prompt ideas for Wan 2.6

Use a global look line, then timestamped shots for pacing.

1Global look and subject
2Shot 1 with timestamp
3Shot 2 with timestamp
4Final beat and end frame
5Optional audio notes (T2V/I2V)

Shot 1 [0-3s] ... Shot 2 [3-8s] ... Shot 3 [8-15s] ... lighting, camera, motion.

For R2V, place @Video1/@Video2/@Video3 in the prompt to anchor subjects.

    Ready-to-render example

    Audio on

    Ready-to-render example

    Wan 2.6 demo clip from MaxVideoAI

    Ready-to-render example (multi-shot, 15s, 16:9, 1080p)

    Global look: elegant thriller, rainy night, soft neon, 35mm, fine film grain, cinematic depth of field, smooth camera, blue/amber palette.

    Shot 1 [0–4s]: slow push-in through a wet alley, reflections on the ground, a silhouette in the distance, hazy atmosphere.

    Shot 2 [4–10s]: cut to close-up of hands opening a sealed envelope on a wooden table, warm side light, very shallow focus.

    Shot 3 [10–15s]: a door opens into an overexposed white room, burst of light, fine dust in the beam, fade to black.

    • Audio option: URL to a soft ambient bed (3–15s), low volume.

    Tips & limitations

    • Multi-shot control for mini narratives
    • Reference videos for stronger identity consistency
    • Flexible aspect ratios for social platforms
    • Reference mode is limited to 5–10s
    • Audio uploads are disabled in Reference mode
    • Keep prompts short and clear to avoid drift

    Common issues (quick fixes)

    • Subject changes / drift → shorter prompts + fewer beats + switch to Reference with 2 tighter-framed videos
    • Camera too jittery → replace "dynamic" with "slow, smooth, controlled" + specify "single take"
    • Beats feel inconsistent → repeat anchors (lens, location, time of day, wardrobe) in each beat
    • Look deviates from the key visual (I2V) → say "same scene / same subject", then ask only for motion

    Wan 2.6 vs Wan 2.5 – Quick overview

    • Wan 2.6 adds Reference → Video for subject consistency
    • Wan 2.6 supports multi-shot prompts and more aspect ratios
    • Wan 2.5 includes 480p option and native audio
    Compare Wan 2.6 vs Wan 2.5 →

    FAQ – Wan 2.6 in MaxVideoAI

    Does Wan 2.6 support audio?

    Audio URLs are optional for Text and Image modes. Reference mode does not support audio uploads.

    How many reference videos can I upload?

    1–3 MP4/MOV references. Tag them in the prompt as @Video1, @Video2, and @Video3.

    What durations are supported?

    Text and Image modes: 5, 10, or 15 seconds. Reference mode: 5 or 10 seconds.

    Explore other models

    Compare price, latency and output options across the MaxVideoAI catalog.

    wan

    Wan 2.5 Text & Image to Video

    Generate Wan 2.5 preview clips from prompts or single reference stills, complete with optional audio and 480p–1080p tiers.

    Compare Wan 2.6 vs Wan 2.5 →

    openai

    OpenAI Sora 2

    Create rich AI-generated videos from text or image prompts using Sora 2. Native voice-over, ambient effects, and motion sync via MaxVideoAI.

    Compare Wan 2.6 vs Sora 2 →

    openai

    OpenAI Sora 2 Pro

    Create longer, more immersive AI videos from text or images using Sora 2 Pro. Native voice, ambient sound, prompt chaining, and advanced control via MaxVideoAI.

    Compare Wan 2.6 vs Sora 2 Pro →

    Wan 2.6 is built for short cinematic sequences with multi-shot control.

    Start with text or image, then move to reference video for consistency.

    Open Wan 2.6 in Generate →