Duration & Output
- Durations: 5 s and 10 s
- Resolutions: 480p, 720p, 1080p (24 fps)
Wan 2.5 lets you storyboard cinematic beats with built-in audio: prompt or image input, optional WAV/MP3, 5 or 10 seconds at 480p/720p/1080p.
Use it for hero beats and reveals where music, ambience or dialogue matter. Upload a track for tight sync or let Wan score the scene natively.
Wan 2.5 – Text or Image to Video with Optional Audio in MaxVideoAI (480p–1080p, 5–10s)
A vertical, cinematic mini action scene where a spy-style hero runs like in a blockbuster trailer, only to reveal at the end…
View render →Why Wan 2.5 is powerful inside MaxVideoAI
Best use cases
Wan 2.5 is a text-and-image-to-video model built for short cinematic clips with native audio.
In MaxVideoAI it’s a flexible, audio-ready engine with tiered pricing by resolution.
In-app flow
These specs describe Wan 2.5 exactly as you can use it today via MaxVideoAI.
Wan 2.5 is the audio-ready short-form engine for 5–10s beats where visuals and sound need to land together.
Clips generated with the exact configuration you have access to in MaxVideoAI.

Wan 2.5 Text & Image to Video · 5s
Cinematic cyberpunk rooftop at night, vertical 9:16. A neon-lit heroine faces a glowing holographic moon; practical LED reflections play realistically across h…
Recreate this shot →
Wan 2.5 Text & Image to Video · 5s
Cinematic Renaissance terrace overlooking a moonlit valley, vertical 9:16. A scholar in ornate embroidered garments stands in a gentle breeze, illuminated by…
Recreate this shot →
Wan 2.5 Text & Image to Video · 5s
Cinematic medieval cliffside at night, vertical 9:16. A lone ranger in a weathered leather cloak stands against a windswept ridge, illuminated by…
Recreate this shot →
Wan 2.5 Text & Image to Video · 5s
10s vertical shot of a fitness smartwatch on a runner’s wrist, timed to an energetic electronic track. Start: close-up on beat one…
Recreate this shot →
Wan 2.5 Text & Image to Video · 10s
Ultra-realistic walking selfie shot filmed with a smartphone held in one hand. The person is speed-walking through a busy urban street in…
Recreate this shot →
Wan 2.5 Text & Image to Video · 10s
Ultra-realistic handheld selfie filmed inside a parked car at night. The person is sitting in the driver’s seat, illuminated softly by streetlights…
Recreate this shot →Use shot-style prompts with camera and audio notes.
[Duration] second [aspect ratio] cinematic shot of [subject] in [environment]. Camera [movement] while [main action] happens. Lighting [style], [grade] look. Audio: [ambience + music/SFX], optional line: “[…]”.
Keep it concise; add or remove audio cues depending on whether you upload a track.
Animate a single still into an audio-backed beat.
Wan 2.5 can tie visual movement to a specific track.
Use downbeats and transitions as anchors in your prompt.
Demo: One Prompt for Wan 2.5
10s vertical shot of a fitness smartwatch on a runner’s wrist, timed to an energetic electronic track. Start: close-up on beat one…
View render →10 second 9:16 product story synced to uploaded track
10s vertical shot of a fitness smartwatch on a runner’s wrist, timed to an energetic electronic track.
Start: close-up on beat one with raindrops on glass.
Beat change: pull back to the runner sprinting in slow motion on a neon-lit bridge.
Final beat: swing to profile close-up with visible breath, display glowing.
Lighting: blue hour, bright highlights on metal.
Audio: uploaded track as main music + subtle footsteps, rain, breathing; no dialogue.
Use Wan 2.5 when visuals and sound must land together—ideate cheap, finish in HD with your track.
Wan 2.5 outputs run through provider and MaxVideoAI safeguards.
Yes. If you don’t upload a track, Wan generates native audio. If you upload WAV/MP3, your track is trimmed/looped to 5 or 10 seconds and used as the main audio.
480p/5s for fastest look-dev; 720p/5–10s for internal reviews and social; 1080p/10s for hero beats and client-ready shots.
Yes. Choose 16:9, 9:16 or 1:1 before rendering; 9:16 is best for mobile-first placements.
Yes. Upload one still (portrait, product, concept art) and focus the prompt on motion, camera and audio.
Per-second by resolution (0.05/0.10/0.15 $/s). It’s mid-tier: cheaper than premium long-form, more capable than ultra-budget silent engines.
Compare price, latency and output options across the MaxVideoAI catalog.
openai
Create rich AI-generated videos from text or image prompts using Sora 2. Native voice-over, ambient effects, and motion sync via MaxVideoAI.
Compare Wan 2.5 vs Sora 2 →openai
Create longer, more immersive AI videos from text or images using Sora 2 Pro. Native voice, ambient sound, prompt chaining, and advanced control via MaxVideoAI.
Compare Wan 2.5 vs Sora 2 →google-veo
Generate cinematic 8-second videos with native audio using Veo 3.1 by Google DeepMind on MaxVideoAI. Reference-to-video guidance, multi-image fidelity, pay-as-you-go pricing from $0.52/s.
Compare Wan 2.5 vs Sora 2 →Wan 2.5 in MaxVideoAI is your audio-ready short-form engine for 5–10 second beats.
Use native or uploaded audio, iterate cheap, and finish in HD when visuals and sound must land together.
Open Generate