GOOGLE GEMINI OMNI VIDEO PREVIEW

Gemini Omni Flash

Stateful video editing, up to 10s, 720p output, reference images, source-video edits and native sound direction in one Google Omni workflow.

Use Gemini Omni Flash when the job is not only one prompt-to-video render: start from text, a source image, up to 10 reference images, a short source video, or a previous interaction id when you want a conversational refine pass.

Gemini Omni Flash multimodal video generation preview
720p
Up to 10sReferences + edit

Gemini Omni Flash preview

Multimodal Google video workflow

Open preview

Stateful refine

Store the interaction id and continue the same Omni output in a follow-up edit.

Reference stack

Guide the scene with one image or up to 10 reference images.

Video edit

Upload a short source clip and describe the change, camera direction and sound direction.

Native sound direction

Give ambience, music, speech or SFX instructions inside the prompt.

Preview limits

Current Google preview constraints are 720p, 16:9 or 9:16, and up to 10 seconds.

Vertex route

MaxVideoAI keeps the implementation on the Google Vertex Interactions path.

Gemini Omni Flash pricing at a glance

Preview 720p totals - review the exact live quote before each generation.

View full pricing

Motion draft

$0.52

4s · 720p

Standard preview

$1.04

Most popular

8s · 720p

Delivery render

$1.30

10s · 720p

Max duration

10s

Up to 10s at 720p

Gemini Omni Flash is a Google preview route. MaxVideoAI displays the customer price before generation and may update pricing as provider SKUs stabilize.

Gemini Omni Flash examples

Approved MaxVideoAI renders now show Gemini Omni Flash handling character performance, camera direction and native audio in 16:9.

View all examples

Real community renders

See what's possible with Gemini Omni Flash.

Recreate any shot

Jump into the app with one click and reuse the setup.

Native audio

Dialogue, ambience and SFX generated in sync.

Multi-shot continuity

Keep characters, style and scene consistency across sequences.

Production-aware

Built-in guardrails and safety filters for responsible review.

Omni Flash or Veo 3.1?

Choose Omni Flash for conversational refine, source-video edits and larger reference stacks. Choose Veo 3.1 when you need the mature Veo route for first/last-frame, extend or higher-resolution delivery.

Compare Omni Flash vs Veo 3.1

Need a refine workflow?

Keep Store interaction enabled when the output may need follow-up edits. The saved interaction id becomes the bridge for the next Omni pass.

Generate in the app

Writing prompts?

Keep the main prompt short, then add separate sound, camera and edit directions so the UI can preserve them across modes.

Open Prompt Lab

Prompt Lab — Gemini Omni Flash

How Gemini Omni Flash uses references

Text-to-video

Start with one clear subject, one action, sound direction and the 16:9 or 9:16 output shape.

Image-to-video

Use one source image when the opening composition or product shape matters.

Reference-to-video

Use multiple references for identity, wardrobe, product form, palette or scene style.

Video edit

Upload one short clip and state what must stay before describing what should change.

Conversational refine

Reuse the previous interaction id for follow-up changes instead of rebuilding the shot.

Global principles

    Engine quirks / what to watch for

      Gemini Omni Flash demo prompt

      10s 16:9 prompt with native sound

      Subject: Two friends on a golden-hour rooftop  •  Action: A recorder turns a memory into moving light
      Camera: Smooth lateral dolly ending on a two-face reaction  •  Style: Premium cinematic realism, warm backlight, soft city atmosphere
      Audio: Rooftop wind, recorder click, ocean echo, one whispered line

      View full prompt
      Prompt: Golden-hour rooftop above a modern city. Two friends discover a small handheld recorder can turn spoken memories into warm moving light. One friend presses record; translucent images of a childhood beach form briefly in the air between them, then dissolve in the wind. Their faces shift from curiosity to wonder. Premium cinematic realism, warm backlight, no text or logos.
      Sound direction: Soft rooftop wind, recorder click, distant city ambience, gentle ocean echo as the memory appears, one whispered line: "That was my favorite day."
      Camera direction: Smooth lateral dolly ending on their surprised faces, 50mm lens feel, shallow depth of field, golden-hour backlight.
      10s16:9Audio on
      Gemini Omni Flash rooftop render with native audio
      10s16:9
      View full render

      Tips and boundaries

      Best practices, common fixes, and important limitations to help you get the strongest results with Gemini Omni Flash.

      What works best

      • Keep Store interaction enabled when a result may need a follow-up refine pass.
      • Use reference-to-video when product identity, wardrobe or style must stay consistent.
      • For source-video edits, describe what stays before describing what changes.
      • Use sound direction as a short director note, not a long soundtrack script.
      • Choose Veo 3.1 instead when you need first/last-frame, extend or higher-resolution delivery.

      Common problems → fast fixes

      • Feels random / inconsistent → simplify to: subject + action + camera + lighting. Re-run 2–3 takes.
      • Motion looks weird → reduce movement: one camera move, slower action, fewer props.
      • Subject drifts off-brand → start from a reference image and lock palette + lighting.
      • Text looks wrong → avoid readable signage, tiny UI, micro labels. Keep text off-screen.
      • Dialogue drifts → keep lines short and punchy; avoid long monologues.

      Hard limits to keep in mind

      • Output is short-form (10s). For longer edits, stitch multiple clips.
      • Resolution tops out at 720p for this tier.
      • No fixed seeds — iteration = re-run + refine.

      Compare Gemini Omni Flash vs other AI video models

      These side-by-side comparisons break down price, resolution, audio, speed, and motion style so you can pick the right engine fast.

      Each page includes real outputs and practical best-use cases.

      Gemini Omni Flash vs Google Veo 3.1

      Generate cinematic Veo 3.1 videos with text prompts, start-image animation, multi-reference guidance, optional last-frame control, and extend workflows in one unified MaxVideoAI model page.

      Compare Gemini Omni Flash vs Google Veo 3.1 →

      Gemini Omni Flash vs Google Veo 3.1 Fast

      Use Veo 3.1 Fast for affordable text prompts, start-image animation, multi-reference guidance, optional last-frame control, and extend workflows with optional native audio inside one unified MaxVideoAI model page.

      Compare Gemini Omni Flash vs Google Veo 3.1 Fast →

      Gemini Omni Flash specs

      The limits that shape your renders.

      View full specs

      Price / second

      $0.13/s

      Text-to-Video

      Supported

      Image-to-Video

      Supported

      Video-to-Video

      Supported (short source-video edit and conversational refine)

      First/Last frame

      Not supported in current Omni route

      Start / reference image

      Supported (up to 10 reference images)

      Reference video

      Supported (short source video for edit; previous interaction id for refine)

      Max resolution

      720p

      Max duration

      10s

      Aspect ratios

      16:9 / 9:16

      FPS options

      24 fps

      Output format

      MP4

      Audio output

      Supported

      Native audio generation

      Supported

      Lip sync

      Prompt-directed only

      Camera / motion controls

      Prompt-based sound, camera and edit directions

      Watermark

      No visible MaxVideoAI watermark; provider provenance markers may apply

      Release date

      Google preview: Jun 30 2026

      Supported routes

      Gemini Omni Flash is exposed as a multimodal video route rather than a Veo-style long-running prediction route.

      Details
      • Text-to-video from a prompt.
      • Image-to-video from one source image.
      • Reference-to-video with up to 10 reference images.
      • Video edit from a short source video.
      • Conversational refine using a previous interaction id.
      • 16:9 and 9:16 output.
      • 720p output up to 10 seconds.
      • Prompt-directed sound generation.

      Boundaries

      Details
      • No negative prompt or seed controls.
      • No first/last-frame workflow.
      • No extend workflow.
      • No 1080p or 4K output in the current preview route.
      • No public provider implementation guide on this marketing page.

      Safety & people / likeness

      Built-in safeguards and best practices for responsible creation with Gemini Omni Flash.

      • Use original characters and owned references.
      • Avoid real people, celebrities and protected characters.
      • Do not use someone's likeness without consent.
      • Avoid copyrighted franchises, logos and protected IP.

      FAQ

      Is Gemini Omni Flash available through Vertex AI on MaxVideoAI?

      Yes. MaxVideoAI implements Gemini Omni Flash as a Google Vertex / Agent Platform Interactions route when the preview route is enabled for the account.

      What is Gemini Omni Flash best for?

      Use it for 720p short videos where text, image references, source-video edits and follow-up conversational refine matter more than 4K delivery or first/last-frame control.

      How is it different from Veo 3.1?

      Omni Flash is better positioned for stateful interaction and broader reference/edit workflows. Veo 3.1 remains the stronger page to evaluate first/last-frame, extend and higher-resolution Veo delivery paths.

      Can Gemini Omni Flash generate audio?

      Yes. Sound is directed through prompt guidance for ambience, music, speech or SFX, subject to the current preview route.

      Does Gemini Omni Flash support 4K or 1080p?

      No. The current MaxVideoAI Omni preview route is documented and exposed as 720p output.