Wan 2.6 Text & Image to Video audio-enabled video example: Vertical 9 16…

Wan 2.6 Text & Image to VideoText to video10s9:16Audio

This Wan 2.6 Text & Image to Video text to video example shows Vertical 9 16 TikTok-style UGC selfie. It highlights audio-enabled output with 10-second timing · 9:16 · 1080p output.

Prompt

Vertical 9:16 TikTok-style UGC selfie video, handheld smartphone feel, natural indoor daylight near a window. A friendly creator speaks directly to camera with natural blinking, subtle head nods, and a warm smile. Add s…

Show full prompt

Vertical 9:16 TikTok-style UGC selfie video, handheld smartphone feel, natural indoor daylight near a window. A friendly creator speaks directly to camera with natural blinking, subtle head nods, and a warm smile. Add small human imperfections: a tiny hesitation, a soft breath, a quick smile mid-sentence, and a micro-pause before the last line. Realistic skin texture, stable identity, no face warping, minimal flicker, clean audio with natural room tone. No subtitles. No on-screen text. No logos. No watermarks. The creator says (exactly, with the same pacing and hesitations): “Okay, so… um… quick thing. If you’re feeling stuck, just do the tiniest first step… like, set a two-minute timer and start. (smiles) That’s it. You’ll be surprised how fast it gets easier.”

Render details

Workflow

Text-to-video workflow

10-second render in 9:16

Audio-enabled output

Realistic styling

Scene focus: Vertical 9 16 TikTok-style UGC selfie

Engine

Wan 2.6 Text & Image to Video

Wan 2.6 merges text, image, and reference-to-video in one card with multi-shot prompting and 720p/1080p tiers.

Text prompts
Image input
Reference video

Specs

Engine

Wan 2.6 Text & Image to Video

Mode

Text to video

Duration

10s

Aspect ratio

9:16

Resolution

1080p

FPS

24

Audio

Enabled

Render cost

$1.95

Created

2026-02-02

Related examples

Recreate

Load this render in the workspace

Start from the same prompt and settings, then remix duration, aspect ratio, references, or audio.