Wan 2.5 Text & Image to Video

Audio enabled

0:00 / 0:00

Wan 2.5 Text & Image to Video audio-enabled video example: city camera move

This Wan 2.5 Text & Image to Video text to video example shows city camera move. It highlights audio-enabled output with 10-second timing · 9:16 output.

Wan 2.5 Text & Image to VideoText to video10s9:16Enabled$0.65

Wan 2.5 Text & Image to VideoText to video10s9:16Audio

Recreate this video Open model page

Prompt breakdown

Prompt used to generate this render.

Ultra-realistic walking selfie shot filmed with a smartphone held in one hand. The person is speed-walking through a busy urban street in daylight. Camera movement is dynamic: fast steps, sudden micro-shakes, quick tilts as the person avoids people and obstacles. Natural motion blur, realistic stabilization drift, shifting sunlight and shadows on their face. High-detail skin texture, real reflections in the eyes. The person speaks extremely fast, slightly out of breath, trying to explain something urgently while walking. Lip-sync must perfectly match the following rapid line: “Okay listen, I don’t have much time but everything’s happening way faster than I expected and I swear I’ll explain everything once I get there!” Audio: realistic city ambience (footsteps, passing cars, faint horns), wind hitting the phone mic, breath sounds, occasional clothing rustle. Keep the phone-mic quality: compressed, slightly distorted on loud peaks. Mood: energetic, chaotic, spontaneous. No filters, no beautification. Keep it raw and real.

Workflow

Text to video

Camera

Audio Enabled

Output

10s · 9:16

Recorded render cost

$0.65

Audio

Enabled

Constraints

Text To Video, Audio Enabled, Camera Move

Prompt improvement notes

Note 1

Keep the subject, camera move, lighting, duration, aspect ratio and audio requirement grouped so the render has one clear production brief.

Note 2

Change one variable at a time when cloning this prompt: model, duration, camera motion or reference input. That makes quality and price differences easier to compare.

Note 3

Add a short negative prompt if you need to block text overlays, logos, distorted hands, face warping or unwanted camera shake.

Compare this model

Review this example beside nearby engines before choosing a render path.

Wan 2.5 Text & Image to Video vs Kling 2.5 TurboCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.5 Text & Image to Video vs Kling 2.6 ProCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.5 Text & Image to Video vs Kling 3 4KCompare specs, pricing, prompt fit and example behavior side by side.

Why Wan 2.5 Text & Image to Video fits this shot

Wan 2.5 handles 5 or 10 second clips with optional background audio plus prompt expansion when you need extra detail.

Audio option

5s or 10s

480p–1080p

Key frames

Related examples

View all examples

Wan 2.5 Text & Image to Video

Wan 2.5 vertical spy-to-Zoom comedy video example

This Wan 2.5 watch page shows a vertical comedy prompt that opens like a spy action scene and ends with a Zoom-call reveal.

Wan 2.5 Text & Image to Video

Wan 2.5 vertical smartwatch runner ad example

This Wan 2.5 example turns a smartwatch prompt into a vertical runner ad with beat-timed motion, rain details and audio-enabled pacing.

LTX 2.3 Pro

LTX 2.3 Pro rooftop lightning fashion shot example

This LTX 2.3 Pro page shows a rooftop fashion prompt with storm lighting, neon city atmosphere and cinematic subject isolation.

OpenAI Sora 2

Sora 2 gorilla dance video example with strobe lighting

This Sora 2 watch page shows a gorilla-mask dance prompt rendered with strobe lighting, changing camera angles, native audio and a 16:9 output.