GOOGLE GEMINI OMNI VIDEO PREVIEW

Gemini Omni Flash

Stateful video editing, up to 10s, 720p output, reference images, source-video edits and native sound direction in one Google Omni workflow.

Use Gemini Omni Flash when the job is not only one prompt-to-video render: start from text, a source image, up to 10 reference images, a short source video, or a previous interaction id when you want a conversational refine pass.

Generate with Gemini Omni Flash Compare with Veo 3.1

Compare vs Veo 3.1 View pricing Prompt examples Model specs

Gemini Omni Flash multimodal video generation preview — Gemini Omni Flash preview
Multimodal Google video workflow

Stateful refine

Store the interaction id and continue the same Omni output in a follow-up edit.

Reference stack

Guide the scene with one image or up to 10 reference images.

Video edit

Upload a short source clip and describe the change, camera direction and sound direction.

Native sound direction

Give ambience, music, speech or SFX instructions inside the prompt.

Preview limits

Current Google preview constraints are 720p, 16:9 or 9:16, and up to 10 seconds.

Vertex route

MaxVideoAI keeps the implementation on the Google Vertex Interactions path.

Gemini Omni Flash pricing at a glance

Preview 720p totals - review the exact live quote before each generation.

View full pricing

Motion draft

$0.52

4s · 720p

Standard preview

$1.04

Delivery render

$1.30

10s · 720p

Max duration

10s

Up to 10s at 720p

Gemini Omni Flash is a Google preview route. MaxVideoAI displays the customer price before generation and may update pricing as provider SKUs stabilize.

Gemini Omni Flash examples

Approved MaxVideoAI renders now show Gemini Omni Flash handling character performance, camera direction and native audio in 16:9.

View all examples

Gemini Omni Flash AI video example: Rain-lit train platform in a fictional near-future city at night. A young composer carrying a small k...

10s

16:9

cinematic

Rain-lit train platform in a fictional near-future city a...

View render Recreate this shot

Gemini Omni Flash AI video example: Golden-hour rooftop above a modern city. Two friends discover a small handheld recorder can turn spok...

10s

16:9

cinematic

Golden-hour rooftop above a modern city

View render Recreate this shot

Real community renders

See what's possible with Gemini Omni Flash.

Recreate any shot

Jump into the app with one click and reuse the setup.

Native audio

Dialogue, ambience and SFX generated in sync.

Multi-shot continuity

Keep characters, style and scene consistency across sequences.

Production-aware

Built-in guardrails and safety filters for responsible review.

Omni Flash or Veo 3.1?

Choose Omni Flash for conversational refine, source-video edits and larger reference stacks. Choose Veo 3.1 when you need the mature Veo route for first/last-frame, extend or higher-resolution delivery.

Compare Omni Flash vs Veo 3.1

Need a refine workflow?

Keep Store interaction enabled when the output may need follow-up edits. The saved interaction id becomes the bridge for the next Omni pass.

Generate in the app

Writing prompts?

Keep the main prompt short, then add separate sound, camera and edit directions so the UI can preserve them across modes.

Open Prompt Lab

Prompt Lab — Gemini Omni Flash

How Gemini Omni Flash uses references

Text-to-video

Start with one clear subject, one action, sound direction and the 16:9 or 9:16 output shape.

Image-to-video

Use one source image when the opening composition or product shape matters.

Reference-to-video

Use multiple references for identity, wardrobe, product form, palette or scene style.

Video edit

Upload one short clip and state what must stay before describing what should change.

Conversational refine

Reuse the previous interaction id for follow-up changes instead of rebuilding the shot.

Global principles

Engine quirks / what to watch for

Gemini Omni Flash demo prompt

10s 16:9 prompt with native sound

Subject: Two friends on a golden-hour rooftop • Action: A recorder turns a memory into moving light
Camera: Smooth lateral dolly ending on a two-face reaction • Style: Premium cinematic realism, warm backlight, soft city atmosphere
Audio: Rooftop wind, recorder click, ocean echo, one whispered line

View full prompt

Prompt: Golden-hour rooftop above a modern city. Two friends discover a small handheld recorder can turn spoken memories into warm moving light. One friend presses record; translucent images of a childhood beach form briefly in the air between them, then dissolve in the wind. Their faces shift from curiosity to wonder. Premium cinematic realism, warm backlight, no text or logos.
Sound direction: Soft rooftop wind, recorder click, distant city ambience, gentle ocean echo as the memory appears, one whispered line: "That was my favorite day."
Camera direction: Smooth lateral dolly ending on their surprised faces, 50mm lens feel, shallow depth of field, golden-hour backlight.

10s16:9Audio on

Gemini Omni Flash rooftop render with native audio

Tips and boundaries

Best practices, common fixes, and important limitations to help you get the strongest results with Gemini Omni Flash.

What works best

Keep Store interaction enabled when a result may need a follow-up refine pass.
Use reference-to-video when product identity, wardrobe or style must stay consistent.
For source-video edits, describe what stays before describing what changes.
Use sound direction as a short director note, not a long soundtrack script.
Choose Veo 3.1 instead when you need first/last-frame, extend or higher-resolution delivery.

Common problems → fast fixes

Feels random / inconsistent → simplify to: subject + action + camera + lighting. Re-run 2–3 takes.
Motion looks weird → reduce movement: one camera move, slower action, fewer props.
Subject drifts off-brand → start from a reference image and lock palette + lighting.
Text looks wrong → avoid readable signage, tiny UI, micro labels. Keep text off-screen.
Dialogue drifts → keep lines short and punchy; avoid long monologues.

Hard limits to keep in mind

Output is short-form (10s). For longer edits, stitch multiple clips.
Resolution tops out at 720p for this tier.
No fixed seeds — iteration = re-run + refine.

Compare Gemini Omni Flash vs other AI video models

These side-by-side comparisons break down price, resolution, audio, speed, and motion style so you can pick the right engine fast.

Each page includes real outputs and practical best-use cases.

Gemini Omni Flash vs Google Veo 3.1

Generate cinematic Veo 3.1 videos with text prompts, start-image animation, multi-reference guidance, optional last-frame control, and extend workflows in one unified MaxVideoAI model page.

Compare Gemini Omni Flash vs Google Veo 3.1 →

Gemini Omni Flash vs Google Veo 3.1 Fast

Use Veo 3.1 Fast for affordable text prompts, start-image animation, multi-reference guidance, optional last-frame control, and extend workflows with optional native audio inside one unified MaxVideoAI model page.

Compare Gemini Omni Flash vs Google Veo 3.1 Fast →

Gemini Omni Flash vs OpenAI Sora 2

Create rich AI-generated videos from text or image prompts using Sora 2. Native voice-over, ambient effects, and motion sync via MaxVideoAI.

Compare Gemini Omni Flash vs OpenAI Sora 2 →

Gemini Omni Flash specs

The limits that shape your renders.

View full specs

Price / second

$0.13/s

Text-to-Video

Image-to-Video

Video-to-Video

Supported (short source-video edit and conversational refine)

First/Last frame

Not supported in current Omni route

Start / reference image

Supported (up to 10 reference images)

Reference video

Supported (short source video for edit; previous interaction id for refine)

Max resolution

720p

Max duration

10s

Aspect ratios

16:9 / 9:16

FPS options

24 fps

Output format

MP4

Audio output

Native audio generation

Lip sync

Prompt-directed only

Camera / motion controls

Prompt-based sound, camera and edit directions

Watermark

No visible MaxVideoAI watermark; provider provenance markers may apply

Release date

Google preview: Jun 30 2026

Supported routes

Gemini Omni Flash is exposed as a multimodal video route rather than a Veo-style long-running prediction route.

Details

Text-to-video from a prompt.
Image-to-video from one source image.
Reference-to-video with up to 10 reference images.
Video edit from a short source video.
Conversational refine using a previous interaction id.
16:9 and 9:16 output.
720p output up to 10 seconds.
Prompt-directed sound generation.

Boundaries

Details

No negative prompt or seed controls.
No first/last-frame workflow.
No extend workflow.
No 1080p or 4K output in the current preview route.
No public provider implementation guide on this marketing page.

Safety & people / likeness

Built-in safeguards and best practices for responsible creation with Gemini Omni Flash.

Use original characters and owned references.
Avoid real people, celebrities and protected characters.
Do not use someone's likeness without consent.
Avoid copyrighted franchises, logos and protected IP.

FAQ

Is Gemini Omni Flash available through Vertex AI on MaxVideoAI?

Yes. MaxVideoAI implements Gemini Omni Flash as a Google Vertex / Agent Platform Interactions route when the preview route is enabled for the account.

What is Gemini Omni Flash best for?

Use it for 720p short videos where text, image references, source-video edits and follow-up conversational refine matter more than 4K delivery or first/last-frame control.

How is it different from Veo 3.1?

Omni Flash is better positioned for stateful interaction and broader reference/edit workflows. Veo 3.1 remains the stronger page to evaluate first/last-frame, extend and higher-resolution Veo delivery paths.

Can Gemini Omni Flash generate audio?

Yes. Sound is directed through prompt guidance for ambience, music, speech or SFX, subject to the current preview route.

Does Gemini Omni Flash support 4K or 1080p?

No. The current MaxVideoAI Omni preview route is documented and exposed as 720p output.