Wan 2.5 Text & Image to Video

Audio enabled

0:00 / 0:00

Wan 2.5 Text & Image to Video audio-enabled video example: street

This Wan 2.5 Text & Image to Video text to video example shows street. It highlights audio-enabled output with 10-second timing · 26:15 output.

Wan 2.5 Text & Image to VideoText to video10s26:15Enabled$0.65

Wan 2.5 Text & Image to VideoText to video10s26:15Audio

Recreate this video Open model page

Prompt breakdown

Prompt used to generate this render.

Ultra-realistic handheld selfie shot, filmed on a modern smartphone. A 30-year-old person stands in natural daylight, holding the phone at arm’s length. Slight camera shake, natural breathing motion, soft shadows on the face, detailed skin texture. The background is a real location: a quiet street with parked cars and warm evening light. The person speaks directly to the camera with a casual, natural tone. Audio: include real recorded ambience (soft wind, distant cars), realistic microphone pickup from a phone’s front mic. Lip sync must match the following line: “I’ve had a long month, but today feels different. I’m ready for a fresh start.” Mood: grounded, authentic, documentary-style realism. No filters, no smoothing, no beauty enhancement.

Workflow

Text to video

Camera

Audio Enabled

Output

10s · 26:15

Recorded render cost

$0.65

Audio

Enabled

Constraints

Text To Video, Audio Enabled

Prompt improvement notes

Note 1

Keep the subject, camera move, lighting, duration, aspect ratio and audio requirement grouped so the render has one clear production brief.

Note 2

Change one variable at a time when cloning this prompt: model, duration, camera motion or reference input. That makes quality and price differences easier to compare.

Note 3

Add a short negative prompt if you need to block text overlays, logos, distorted hands, face warping or unwanted camera shake.

Compare this model

Review this example beside nearby engines before choosing a render path.

Wan 2.5 Text & Image to Video vs Kling 2.5 TurboCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.5 Text & Image to Video vs Kling 2.6 ProCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.5 Text & Image to Video vs Kling 3 4KCompare specs, pricing, prompt fit and example behavior side by side.

Why Wan 2.5 Text & Image to Video fits this shot

Wan 2.5 handles 5 or 10 second clips with optional background audio plus prompt expansion when you need extra detail.

Audio option

5s or 10s

480p–1080p

Key frames

Related examples

View all examples

Wan 2.5 Text & Image to Video

Wan 2.5 vertical spy-to-Zoom comedy video example

This Wan 2.5 watch page shows a vertical comedy prompt that opens like a spy action scene and ends with a Zoom-call reveal.

Wan 2.5 Text & Image to Video

Wan 2.5 vertical smartwatch runner ad example

This Wan 2.5 example turns a smartwatch prompt into a vertical runner ad with beat-timed motion, rain details and audio-enabled pacing.

LTX 2.3 Pro

LTX 2.3 Pro rooftop lightning fashion shot example

This LTX 2.3 Pro page shows a rooftop fashion prompt with storm lighting, neon city atmosphere and cinematic subject isolation.

OpenAI Sora 2

Sora 2 gorilla dance video example with strobe lighting

This Sora 2 watch page shows a gorilla-mask dance prompt rendered with strobe lighting, changing camera angles, native audio and a 16:9 output.