← Retour aux modèles

Veo 3.1 – Texte→vidéo & image→vidéo dans MaxVideoAI (720p/1080p, 4–8 s)

Veo 3.1 – Vidéo cinématique avec audio natif et contrôle du cadrage (4–8 s, 720p/1080p)

720p/1080p4–8 sEntrée texte ou image

Generate short, cinematic videos with Google DeepMind's Veo 3.1 directly inside your MaxVideoAI workspace. Text-to-video, image-to-video, framing presets, and native audio with transparent per-second pricing.

Describe the scene, choose 4, 6 or 8 seconds, pick 16:9, 9:16 or 1:1, decide whether you want native audio, and let Veo 3.1 deliver polished footage for ads, explainers, campaigns, and client work.

Audio on8s

Veo 3.1 – Texte→vidéo & image→vidéo dans MaxVideoAI (720p/1080p, 4–8 s)

Cinematic 8-second TV commercial in 16:9 with sound. From a tiny FPV-style camera flying indoors, we explore a bright, modern apartment. At…

View render →

Why Veo 3.1 is powerful inside MaxVideoAI:

  • Text → Video, Image → Video, and multi-image reference in one place
  • Cinematic Controls for framing, tone and motion before you hit render
  • Native audio (dialogue, ambience, SFX) with an audio on/off toggle
  • Seed and Extend options to keep framing consistent and grow longer sequences
  • Pay-as-you-go pricing – you only pay for the seconds you generate
  • Available in Europe, UK and worldwide via licensed DeepMind endpoints
  • Designed to sit alongside Sora 2, Sora 2 Pro, Pika 2.2, Kling, Wan, MiniMax Hailuo, Nano Banana

Meilleurs usages

  • Plans hero de marque et reveals produit haut de gamme
  • Actifs de campagne avec cadrage/ton cohérents
  • Pubs sociales en 9:16, 16:9 ou 1:1
  • Courts explainers, éducation et B-roll cinématique
  • Pré-viz et tests de concept où le langage caméra compte

What Veo 3.1 Actually Is in MaxVideoAI

On paper, Veo 3.1 is DeepMind’s latest short-form video model with richer audio and tighter prompt adherence.

In MaxVideoAI, Veo 3.1 is exposed as a controlled, production-ready engine:

MaxVideoAI wraps all of this in a simple flow:

  1. 1. Pick Veo 3.1 as the engine.
  2. 2. Choose Text → Video or Image → Video.
  3. 3. Set duration (4/6/8 s), aspect ratio, and resolution (720p or 1080p).
  4. 4. Choose framing/tone presets, then paste a structured prompt.
  5. 5. See the final price per clip before you generate.
  6. 6. Compare against other engines in the same GUI.

Real Specs – Veo 3.1 in MaxVideoAI (720p/1080p, 4–8s)

These specs describe Veo 3.1 exactly as you can use it today via MaxVideoAI – not theoretical lab demos.

Duration & Output

  • Durations: 4 s, 6 s, 8 s
  • Output resolution: 720p (1280x720) or 1080p (1920x1080)
  • Frame rate: 24 fps cinematic cadence

Aspect Ratios

  • 16:9 – horizontal / web video
  • 9:16 – vertical / Reels / Shorts / TikTok
  • 1:1 – square / feed placements

Inputs & File Types

  • Text prompts written like a shot list (1–3 sentences for single-beat clips; Shot 1/2/3 for 8s sequences)
  • Reference images: PNG, JPG, WebP; up to 4 stills to lock identity, wardrobe and lighting
  • Image → Video: animate a single still (from Nano Banana or your own assets) into a 4–8s shot
  • No direct video input in this configuration; use Extend and sequencing for longer arcs

Audio

  • Native audio on by default (VO, ambience, SFX)
  • Toggle audio off for cheaper silent renders when you will design sound later
  • Treat Veo audio as a first-draft soundtrack, then polish timing and loudness in post

Pricing

  • Per-second model inside MaxVideoAI
  • Example config: perSecondCentsAudioOn = 40; perSecondCentsAudioOff = 20
  • Audio on (~$0.40/s): 4s ≈ $1.60; 6s ≈ $2.40; 8s ≈ $3.20
  • Audio off (~$0.20/s): 4s ≈ $0.80; 6s ≈ $1.20; 8s ≈ $1.60
  • No subscription: top up your MaxVideoAI wallet and preview price before render

Render timing

  • Add observed render-time note here (e.g., live queue estimates shown in-app for expected start/finish)

Veo 3.1 in MaxVideoAI gives you framing presets, native audio control, seeds and Extend—so it behaves like a directable camera, not a black box.

Example Gallery: Real Veo 3.1 Outputs

See live Veo 3.1 renders powered by the same settings you have in MaxVideoAI.

View all Veo 3.1 examples →

MaxVideoAI Google Veo 3.1 example – Shot 1 (0–3 s): macro close-up of one earbud rotating slowly on a wooden desk, shallow depth of field, warm desk lamp…

Google Veo 3.1 · 8s

Shot 1 (0–3 s): macro close-up of one earbud rotating slowly on a wooden desk, shallow depth of field, warm desk lamp…

Recreate this shot →
MaxVideoAI Google Veo 3 Fast example – Cinematic 8-second TV commercial in 16:9 with sound. From a tiny FPV-style camera flying indoors, we explore a bright, modern apartment. At…

Google Veo 3 Fast · 8s

Cinematic 8-second TV commercial in 16:9 with sound. From a tiny FPV-style camera flying indoors, we explore a bright, modern apartment. At…

Recreate this shot →

Text-to-Video with Veo 3.1

Write prompts like a short director’s note, built around cinematography, subject, action, context and style.

1Cinematography and framing – medium close-up, wide tracking shot, top-down macro, etc.
2Subject – who or what we see.
3Action – what happens in 4–8 seconds.
4Context / environment – office, city street at night, kitchen studio, classroom.
5Style and ambiance – cinematic, realistic, documentary; lighting and grade.
6Audio cues – ambience, music style, one short VO line; specify no subtitles if you don’t want on-screen text.
7Format and length – e.g., 8 seconds, 16:9 (or 9:16 / 1:1).

Medium shot of [subject] in [environment], [clear action] over 8 seconds. Camera [movement], 16:9 at 1080p, cinematic look with [lighting and color]. Audio: [ambience] + [music/VO cue], no subtitles.

Drop that into MaxVideoAI, choose Veo 3.1, set duration/orientation, and you’re ready to render.

Image-to-Video Workflow with Veo 3.1 (+ Nano Banana)

Pair Veo 3.1 with Nano Banana to lock style and iterate on motion.

  1. Generate 1–4 reference stills in Nano Banana (or import your brand stills).
  2. Send them into Veo 3.1 as reference images in Text→Video, or start from a single still in Image→Video.
  3. Focus your prompt on motion, timing and audio: how the camera moves, how the subject moves, and how the beat should end at 4/6/8s.
  4. Regenerate with the same references to keep identity consistent.
  • On-brand product hero clips
  • Logo and title animations with consistent backgrounds
  • Short explainer visuals based on diagrams or UI stills

Multi-Shot & Sequenced Clips – Directed 8s Beats in Veo 3.1

Veo 3.1 can compress a mini-sequence into a single 6 or 8 second clip when you write a structured prompt.

Use seeds and Extend to keep framing consistent across beats.

  • Aim for 2–3 shots per 8-second clip.
  • One main action and one clear camera move per beat.
  • Keep subject, wardrobe and environment consistent; use references to lock them.
  • Treat it as one scene with multiple angles rather than many locations.
  • Extend strong shots to build 12–24s establishing sequences.

Demo: One Sequenced Prompt (with Native Audio)

Audio on8s

Demo: One Sequenced Prompt (with Native Audio)

Shot 1 (0–3 s): macro close-up of one earbud rotating slowly on a wooden desk, shallow depth of field, warm desk lamp…

View render →

8 second cinematic product story for wireless earbuds (16:9, 1080p)

Shot 1 (0–3 s): macro close-up of one earbud rotating slowly on a wooden desk, shallow depth of field, warm desk lamp glow.

Shot 2 (3–6 s): medium shot of a young professional putting the earbuds in before stepping onto a busy city street, subtle bokeh lights.

Shot 3 (6–8 s): close-up of the charging case clicking shut next to a laptop, soft logo reflection in the lid.

Camera: smooth dolly moves between shots, handheld feel but not shaky.

Lighting: evening, warm indoors transitioning to cool street light, gentle film grain.

Audio: city ambience low in the mix, soft electronic music bed, short VO line: “Block the noise, keep the focus.” No subtitles.

Negative: no brand names, no on-screen text, no extreme wide angles.

  • Clear 3-beat structure with coherent environment and defined cinematography.
  • Realistic VO that fits in 8 seconds; explicit no-subtitles call to avoid unwanted text overlays.

Tips & Limitations in Plain English

  • Strong camera control via framing presets and motion cues
  • Great for ads, explainers and campaigns that need consistency across variants
  • Native audio for first-pass sound design and VO
  • Handles short, focused sequences better than chaotic prompts
  • Clips are 4–8 seconds; use Extend and editing to go longer
  • 1080p max – no 4K outputs in this configuration
  • Tiny text and detailed UI are hit-or-miss; overlay critical copy in post
  • If it drifts, tighten subject and camera instructions and reduce actions

Lean into these constraints and Veo 3.1 becomes a repeatable, directable tool instead of a slot machine.

Safety, People & Likeness

  • Do not generate real public figures, politicians or celebrities.
  • No minors in risky or suggestive contexts; no explicit sexual content or hateful/violent scenes.
  • Avoid using real people’s likeness without consent.
  • Some prompts and images may be blocked or adjusted by provider policies; this is expected.
  • Generic characters and scenes are fine.
  • Provider policies and SynthID-style provenance apply; MaxVideoAI adds its own safety filters.

These guardrails keep Veo 3.1 usable and compliant for professional work.

Veo 3.1 vs Veo 3.1 Fast – Quick Overview

  • Veo 3.1 is the premium, full-fidelity tier with richer audio and motion.
  • Veo 3.1 Fast is cheaper and faster for drafting and social variants.
  • Draft in Veo 3.1 Fast, then regenerate winners in Veo 3.1 at 1080p with native audio.
Compare Veo 3.1 vs Veo 3.1 Fast →

FAQ – Veo 3.1 in MaxVideoAI

Veo 3.1 est-il dispo en Europe/UK ?

Oui. MaxVideoAI route Veo via des endpoints DeepMind sous licence, sans contrat séparé.

Peut-il générer du vertical ?

Oui : 16:9, 9:16, 1:1. Choisissez 9:16 pour Reels/TikTok/Shorts et centrez l’action.

Image→vidéo pris en charge ?

Oui. Partez d’un still (Image→Vidéo) ou utilisez 1–4 références en Texte→Vidéo.

Au-delà de 8 secondes ?

Bases 4/6/8 s. Utilisez Extend et enchaînez les clips comme blocs de 4–8 s.

Comment rester on-brand ?

Références Nano Banana ou bibliothèque de marque, descriptions cohérentes et palette/lumière explicites.

Explore other models

Compare pricing, latency, and output options across other engines available in MaxVideoAI.

google-veo

Google Veo 3.1 Fast

Use Veo 3.1 Fast for affordable, fast AI video generation. Up to 8-second clips with optional native audio—ideal for social formats and iterative testing.

Compare Veo 3.1 vs Veo 3.1 Fast →

google-veo

Google Veo 3.1 First/Last Frame

Upload starting and ending frames, write a brief, and let Veo 3.1 animate seamless transitions with optional native audio. Swap to Fast mode for cheaper iterations.

Compare Veo 3.1 vs Veo 3.1 Fast →

openai

OpenAI Sora 2

Create rich AI-generated videos from text or image prompts using Sora 2. Native voice-over, ambient effects, and motion sync via MaxVideoAI.

Compare Veo 3.1 vs Veo 3.1 Fast →

Veo 3.1 in MaxVideoAI gives you direct, pay-as-you-go access to DeepMind’s most controllable short-form video engine.

Framing and audio controls make it feel like a virtual camera, not just another black-box generator.

Open Generate