97+ MODELS LIVE — TRY THE AUDIO LAB
STUDIO
Image → Videovidu-q2vidu-q2-reference

Vidu Q2 Reference

Vidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.

Vidu guideVerified

Shengshu Tech's video model. Reference-to-video pioneer — designed around blending multiple input images into a consistent moving subject.

Strengths
  • Best multi-image reference-to-video in the field — 1–9 references blend cleanly.
  • Subject consistency across shots when references are well-chosen.
  • Strong start-and-end-frame mode (vidu-q3-pro-first-last-frames).
Weaknesses
  • Pure text-to-video quality is mid-pack.
  • Camera motion is less controllable than Runway / Kling.
Best for
  • Character / costume consistency across multiple shots
  • Product variations — same item shot from different angles
  • Start+end frame animations where you control both poles
Prompting tips
  • Provide reference images that match the desired LIGHTING and ANGLE — Vidu blends literally.
  • Describe each @image in the prompt ("@image1 from above", "@image2 close-up") for precise blending.
Parameters
  • images_list
    array
    Upload or provide image urls. Used for image-to-video generation.
  • resolution
    string
    The resolution of the generated video.
    360p540p720p1080p
    default: 720p
  • aspect_ratio
    string
    Aspect ratio of the output video.
    16:99:164:33:41:1
    default: 16:9
  • duration
    int
    The duration of the generated video in seconds
    default: 5
    range: 2 8
  • movement_amplitude
    string
    The movement amplitude of objects in the frame.
    autosmallmediumlarge
    default: auto
You'll need
  • A text prompt
  • Source images
Try now

More from vidu-q2