ByteDance model

Seedance 2.0

Seedance 2.0 gives you three production workflows in one model family: text-to-video, image-to-video with an optional end frame, and reference-to-video with image, video, and audio references. Fal positions it around native audio, real-world physics, director-level camera control, and multi-shot continuity when you need a more cinematic result.

Best for Action scenes with collisions, debris, and more believable physical interaction, Multi-shot ads with consistent product and scene continuity, and Audio-led scenes with synced dialogue, ambience, and sound effects.

Text→VideoImage→Video720p15sAuto / 21:9 / 16:9 / 4:3 / 1:1 / 3:4 / 9:16Audio

Pay-as-you-go on MaxVideoAI · Price shown before you generate

Seedance 2.0 AI video example: Prompt EN - safer cinematic rooftop run to emotional reunion Brief: A cinematic action-drama sequence that...
Audio on12s
  • Price$0.40/s
  • Duration12s
  • Format16:9
View render →

Use cases

Action scenes with collisions, debris, and more believable physical interactionMulti-shot ads with consistent product and scene continuityAudio-led scenes with synced dialogue, ambience, and sound effectsStoryboard-to-video using text and multimodal referencesDirector-style camera moves, transitions, and framing controlPre-visualization before final edit (concept to shot list to rough cut)

What makes Seedance 2.0 different

  • Director-level camera grammar (Fal highlights dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement)
  • Real-world physics under pressure (Action scenes, collisions, debris, and fabric motion are part of the model's public positioning on Fal)
  • Native audio in the same generation (Dialogue, ambience, music, and sound effects stay synchronized without separate audio layering)
  • Unified multimodal control (Text, image, audio, and video inputs can be combined inside the Seedance 2 workflow family)
  • Multi-shot outputs up to 15s (Fal highlights natural cuts and transitions inside a single generation rather than a single unbroken take)

Specs

The limits that shape your renders.
Price / second480p $0.18/s720p $0.40/s
Text-to-VideoSupported
Image-to-VideoSupported
First/Last frameSupported (start + end image in i2v)
Reference image / style referenceSupported (multi reference stills)
Reference videoSupported
Max resolution720p
Max duration15s
Aspect ratiosAuto / 21:9 / 16:9 / 4:3 / 1:1 / 3:4 / 9:16
FPS options24
Output formatMP4
Audio outputSupported
Native audio generationSupported
Camera / motion controlsAdvanced
WatermarkNo (MaxVideoAI)
Multimodal input stackDetails

Fal's public page frames Seedance 2.0 as a unified multimodal audio-video model.

  • Text instructions + multimodal references
  • Up to 9 image references
  • Up to 3 video references
  • Up to 3 audio references
  • Up to 12 total files across the ref2v run
Output style and structureDetails

Fal highlights cinematic control, native audio, realistic physics, multi-shot cuts, and up to 15 seconds per generation.

  • 480p or 720p
  • Auto or 4-15s per generation
  • Aspect ratios: auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
  • Image-to-video supports an optional end frame
  • Natural transitions across multiple shots
  • Native audio-video joint generation
  • Audio toggle available with the same pricing whether audio is on or off

Seedance 2.0 examples

Use this gallery to review prompt patterns, scene structure, and the kinds of beats Seedance 2.0 handles well inside MaxVideoAI.

View all Seedance examples ->

Prompt Lab — Seedance 2.0

Official Seedance 2.0 page

Seedance 2.0 works best when the brief is ordered and the workflow is explicit. Start with Subject -> Action -> Camera -> Style, then decide whether the shot should stay prompt-led, use a start/end image pair, or lean on multimodal references.

Tip: duration + aspect ratio are set in the UI — your prompt controls subject, action, camera, lighting, style, and sound.

Text-to-video prompt

Use this when the shot starts from language, not from uploaded assets.

Text = prompt-only generation from a clean brief.

Template (copy/paste)

Subject:
[Who/what appears + 2-3 defining traits]

Action:
[One visible action or one timed beat]

Camera:
[Shot size + angle + one move + optional transition verb]

Style:
[Lighting + palette + texture / lens feel]

Audio:
[Ambience + 1-2 SFX cues + optional short dialogue]

Example:
A courier in a soaked yellow jacket sprints through a narrow alley at night. He jumps a puddle and looks back once. Camera: wide tracking shot into a short handheld close-up. Style: wet asphalt reflections, cold blue street light, subtle film grain. Audio: footsteps, distant siren, one short breathy line.

Example

Handheld smartphone UGC clip of a woman unboxing a new skincare bottle at a kitchen table. She peels the seal, smiles, and turns the bottle toward camera. Soft window daylight, natural colors, subtle room tone + packaging crinkle.

Demo prompt — Seedance 2.0

Seedance 2.0 AI video example: Brief: A dark cinematic transformation with grounded American realism, blending horror and absurd humor. A...
Audio on12s

Brief: A dark cinematic transformation with grounded American realism, blending horror and absurd humor. A school bus mutates into a terrifying mechanical caterpillar in a deserted US city. Subject: An old American school bus (faded yellow-green, rust patches, tall and bulky) is parked in an empty American city…

View render →

Tips and boundaries

What works best

  • Strong cinematic camera direction when beats are explicit.
  • Improved continuity with multimodal references.
  • Native audio sync in short multi-shot outputs.
  • High utility for ads, action beats, and storyboard prototyping.

Common problems → fast fixes

  • Cuts feel abrupt -> add transition verbs and timestamps (match cut at 5s, whip pan into Shot 2).
  • Continuity drifts -> add anchors (wardrobe, prop, location) and reuse references.
  • Audio mismatch -> shorten dialogue, pin SFX to visible action, keep one ambience bed.
  • Physics looks off -> simplify simultaneous actions per beat and reduce fast interactions.

Hard limits to keep in mind

  • Complex action stacks still benefit from simpler, clearly separated beats.
  • Reference-heavy prompts work best when each source has one clear job.
  • Reference audio only works when at least one image or video reference is attached.
  • Check the active Fal route family before scaling production traffic.
  • Use the final quote in Generate before client-facing rollout or automated workflows.

Seedance 2.0 vs Seedance 2.0 Fast

View Seedance 2.0 Fast details →

Use Seedance 2.0 when you need:

  • More polished motion and native audio for finals
  • Stronger multi-shot continuity for launch work
  • The main Seedance tier for flagship ads and hero scenes

Use Seedance 2.0 Fast when you want:

  • Rapid draft passes before client-facing finals
  • Quick shot planning, pacing checks, and A/B motion tests
  • A lighter Seedance tier for early creative exploration

Compare Seedance 2.0 vs other AI video models

Not sure if Seedance 2.0 is the best fit for your shot? These side-by-side comparisons break down the tradeoffs — price per second, resolution, audio, speed, and motion style — so you can pick the right engine fast.

Each page includes real outputs and practical best-use cases.

Seedance 2.0 vs Kling 3 Pro

Open Kling 3 Pro when you want a stronger emphasis on scene sequencing, shot control, and storyboard-style multi-prompt direction.

Compare Seedance 2.0 vs Kling 3 Pro →

Safety & people / likeness

  • Don’t generate real people or public figures (celebrities, politicians, etc.).
  • No minors, sexual content, hateful content, or graphic violence.
  • Don’t use someone’s likeness without consent.
  • Some prompts and reference images may be blocked — generic characters and scenes are fine.

FAQ

What is Seedance 2.0?

Seedance 2.0 is ByteDance's flagship AI video model focused on cinematic motion, multi-shot continuity, native audio, and multimodal control.

Does Seedance 2.0 support text-to-video, image-to-video, and reference-to-video?

Yes. On MaxVideoAI, Seedance 2.0 covers text-to-video, image-to-video with an optional end frame, and reference-to-video with image, video, and audio inputs.

How many image, video, and audio references can I add?

Seedance 2.0 supports up to 9 image references, 3 video references, and 3 audio references, with 12 total files across the full ref2v run.

Does Seedance 2.0 image-to-video support an end frame?

Yes. The image-to-video workflow supports a start image plus an optional end image when you want to guide the last frame.

How long can generated videos be?

Fal lists auto duration or explicit runs from 4 to 15 seconds for the current Seedance 2.0 routes.

Does Seedance 2.0 support native audio?

Yes. Native audio is available, generate_audio defaults to on, and Fal states pricing is the same whether audio is generated or not.

Should I start with Seedance 2.0 standard or Fast?

Start with standard Seedance 2.0 when you want the strongest final quality, and switch to Seedance 2.0 Fast for quicker draft passes, route comparisons, and shot planning.