97+ MODELS LIVE — TRY THE AUDIO LAB
STUDIO
Videoviduvidu-q3-text-to-video

Vidu Q3 Text To Video

Vidu Q3 Text To Video via fal.ai (fal-ai/vidu/q3/text-to-video).

Vidu guideVerified

Shengshu Tech's video model. Reference-to-video pioneer — designed around blending multiple input images into a consistent moving subject.

Strengths
  • Best multi-image reference-to-video in the field — 1–9 references blend cleanly.
  • Subject consistency across shots when references are well-chosen.
  • Strong start-and-end-frame mode (vidu-q3-pro-first-last-frames).
Weaknesses
  • Pure text-to-video quality is mid-pack.
  • Camera motion is less controllable than Runway / Kling.
Best for
  • Character / costume consistency across multiple shots
  • Product variations — same item shot from different angles
  • Start+end frame animations where you control both poles
Prompting tips
  • Provide reference images that match the desired LIGHTING and ANGLE — Vidu blends literally.
  • Describe each @image in the prompt ("@image1 from above", "@image2 close-up") for precise blending.
Parameters
  • duration
    int
    Duration of the video in seconds
    default: 5
    range: 1 16
  • resolution
    string
    Output video resolution
    360p540p720p1080p
    default: 720p
  • aspect_ratio
    string
    The aspect ratio of the output video
    16:99:164:33:41:1
    default: 16:9
  • audio
    boolean
    Whether to use direct audio-video generation. When true, outputs video with sound.
    default: true
  • seed
    int
    Random seed for reproducibility. If None, a random seed is chosen.
You'll need
  • A text prompt
Try now

More from vidu