Wan 2.6 Text & Image to Video

Audio enabled

0:00 / 0:00

Wan 2.6 Text & Image to Video audio-enabled video example: Vertical 9 16…

This Wan 2.6 Text & Image to Video text to video example shows Vertical 9 16 TikTok-style UGC selfie. It highlights audio-enabled output with 10-second timing · 9:16 · 1080p output.

Wan 2.6 Text & Image to VideoText to video10s9:16Enabled$1.95

Wan 2.6 Text & Image to VideoText to video10s9:16Audio

Recreate this video Open model page

Prompt breakdown

Prompt used to generate this render.

Vertical 9:16 TikTok-style UGC selfie video, handheld smartphone feel, natural indoor daylight near a window. A friendly creator speaks directly to camera with natural blinking, subtle head nods, and a warm smile. Add small human imperfections: a tiny hesitation, a soft breath, a quick smile mid-sentence, and a micro-pause before the last line. Realistic skin texture, stable identity, no face warping, minimal flicker, clean audio with natural room tone. No subtitles. No on-screen text. No logos. No watermarks. The creator says (exactly, with the same pacing and hesitations): “Okay, so… um… quick thing. If you’re feeling stuck, just do the tiniest first step… like, set a two-minute timer and start. (smiles) That’s it. You’ll be surprised how fast it gets easier.”

Workflow

Text to video

Camera

Audio Enabled

Output

10s · 9:16 · 1080p

Recorded render cost

$1.95

Audio

Enabled

Constraints

Text To Video, Audio Enabled

Prompt improvement notes

Note 1

Keep the subject, camera move, lighting, duration, aspect ratio and audio requirement grouped so the render has one clear production brief.

Note 2

Change one variable at a time when cloning this prompt: model, duration, camera motion or reference input. That makes quality and price differences easier to compare.

Note 3

Add a short negative prompt if you need to block text overlays, logos, distorted hands, face warping or unwanted camera shake.

Compare this model

Review this example beside nearby engines before choosing a render path.

Wan 2.6 Text & Image to Video vs Kling 2.5 TurboCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.6 Text & Image to Video vs Kling 2.6 ProCompare specs, pricing, prompt fit and example behavior side by side.Wan 2.6 Text & Image to Video vs Kling 3 4KCompare specs, pricing, prompt fit and example behavior side by side.

Why Wan 2.6 Text & Image to Video fits this shot

Wan 2.6 merges text, image, and reference-to-video in one card with multi-shot prompting and 720p/1080p tiers.

Text prompts

Image input

Reference video

Key frames

Related examples

View all examples

Wan 2.5 Text & Image to Video

Wan 2.5 vertical spy-to-Zoom comedy video example

This Wan 2.5 watch page shows a vertical comedy prompt that opens like a spy action scene and ends with a Zoom-call reveal.

Wan 2.6 Text & Image to Video

Wan 2.6 rainy neon thriller sequence example

This Wan 2.6 page shows a rainy neon thriller prompt with multi-shot direction, smooth camera work and audio-enabled pacing.

LTX 2.3 Pro

LTX 2.3 Pro rooftop lightning fashion shot example

This LTX 2.3 Pro page shows a rooftop fashion prompt with storm lighting, neon city atmosphere and cinematic subject isolation.

OpenAI Sora 2

Sora 2 gorilla dance video example with strobe lighting

This Sora 2 watch page shows a gorilla-mask dance prompt rendered with strobe lighting, changing camera angles, native audio and a 16:9 output.