LTX 2.3 Fast image-to-video example: Use the uploaded image as the
This LTX 2.3 Fast image to video example shows Use the uploaded image as the. It highlights audio-enabled output with 14-second timing · 16:9 · 1080p output.
Prompt
Use the uploaded image as the strict start-frame anchor. Preserve the exact same crowded metro carriage, the same gorilla in a dark tailored suit, the same alpaca in a formal suit and glasses, and the same surrounding c…
Show full promptHide full prompt
Use the uploaded image as the strict start-frame anchor. Preserve the exact same crowded metro carriage, the same gorilla in a dark tailored suit, the same alpaca in a formal suit and glasses, and the same surrounding commuters. Keep the framing realistic and cinematic. The train is moving steadily through the tunnel with subtle carriage sway, soft metallic rattling, low rail noise, distant tunnel rumble, fluorescent hum, and realistic motion blur outside the windows. No exaggerated action. The entire scene is driven by performance, timing, breathing, silence, and eye contact. The gorilla and the alpaca stand face to face in the middle of the crowded metro, both completely serious, tired, and slightly awkward, like two strangers who are not sure whether a social interaction has just happened. Performance direction: - very subtle body movement only - natural breathing visible in the chest and shoulders - tiny eye movements - slight hesitation before each line - uncomfortable but controlled silence - deadpan British-style social awkwardness - surrounding commuters remain mostly quiet and serious, with minimal reaction Dialogue timing and acting: 0:00–0:03 The train sways gently. The gorilla briefly glances toward the alpaca, then away, then back again. Gorilla, low voice, awkward, almost apologetic: “Sorry… were you talking to me?” 0:03–0:05 A short silence. The alpaca blinks once, keeps a straight face, tiny inhale. Alpaca, calm and dry: “No.” 0:05–0:07 Another pause. The gorilla looks slightly confused, shifts his grip, breathes out through the nose. Gorilla: “Right… and you?” 0:07–0:09 The alpaca gives the smallest possible side glance, still perfectly serious. Alpaca: “No, not really.” 0:09–0:11 A longer silence. The train rattles. One nearby commuter subtly looks up, then looks away again. Gorilla, almost to himself: “No one talks anymore anyway.” 0:11–0:14 Silence. The alpaca stares forward, then gives a tiny thoughtful nod. Alpaca, quietly: “That’s true, actually.” Audio direction: - realistic moving metro ambience throughout - soft rail clatter and low tunnel rumble - fluorescent carriage hum - subtle clothing movement and breathing during pauses - dialogue clean, dry, understated, intimate, no theatrical projection - leave natural silence between lines - no music - no subtitles - no text on screen - no logos - no extra fantasy elements Visual direction: prestige cinematic realism, restrained performance comedy, subtle depth of field, grounded lighting, natural commuter stillness, premium film look, humor comes entirely from timing, silence, and serious acting.
Render details
Workflow
Image-to-video workflow
14-second render in 16:9
Audio-enabled output
Single reference image
Cinematic styling
Controls
Reference image
Provided
Engine
LTX 2.3 Fast
Generate fast AI video with LTX 2.3 Fast on MaxVideoAI. Text and image workflows support 6–20s clips, 1080p/1440p/4K, native audio, and Fal’s 25/50 fps options.
Specs
Engine
LTX 2.3 Fast
Mode
Image to video
Duration
14s
Aspect ratio
16:9
Resolution
1080p
FPS
25
Audio
Enabled
Render cost
$0.73
Created
2026-03-19
Related examples

Same example family
LTX 2.3 Pro image-to-video example: city camera move
This LTX 2.3 Pro image to video example shows city camera move. It highlights audio-enabled output and camera motion control with 10-second timing · 16:9 · 1080p output.

Same example family
LTX 2.3 Fast product ad example: A charismatic female racer in a
This LTX 2.3 Fast text to video example shows A charismatic female racer in a. It highlights audio-enabled output with 10-second timing · 16:9 · 1080p output.

Shared capability
Google Veo 3.1 Fast camera movement example: living room commercial
This Google Veo 3.1 Fast text to video example shows living room commercial. It highlights audio-enabled output and camera motion control with 6-second timing · 16:9 output.

Shared capability
Kling 2.6 Pro audio-enabled video example: 10-second 16 9 cinematic shot in
This Kling 2.6 Pro text to video example shows 10-second 16 9 cinematic shot in. It highlights audio-enabled output with 10-second timing · 16:9 output.
Recreate
Load this render in the workspace
Start from the same prompt and settings, then remix duration, aspect ratio, references, or audio.