Image → Videoveo3.1veo3.1-reference-to-video

Veo3.1 Reference To Video

Veo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.

Open in Video Lab Browse all models~124 credits per run

Veo guideVerified

Google DeepMind's video model. The only major brand with **native audio generation** — voices, sound effects, music — synced to the video in one pass. Veo 3 Fast and Veo 3.1 are the workhorses.

Strengths

Native audio: dialogue, foley, ambient sound rendered with the video. No separate lipsync pass needed.
Strong physical realism — water, cloth, smoke behave plausibly.
Veo 3.1 4K Video supports upscaling to 4K resolution.
Fast tier is genuinely fast (sub-minute typical) while maintaining cinematic quality.
Reliable adherence to camera direction (dolly-in, whip-pan, crane shots).

Weaknesses

Stylized animation (anime, painterly) is weaker than realism — Veo is photographic by default.
Audio quality varies; complex multi-voice scenes can muddle.
Strict safety filters reject more prompts than competitors.
Duration capped at 8s on Fast tier.

Best for

Cinematic realism with audio (interviews, narration, dialogue scenes)
Product demos / commercials where lip-synced voiceover matters
Atmospheric scenes that benefit from ambient sound
Quick drafts when speed > artistic flair

Avoid for

Heavily stylized anime / painterly looks (try Kling, Seedance)
Edgy or explicit content (will be filtered)

Prompting tips

Audio direction: explicitly mention what should be heard ("man says 'hello'", "distant traffic hum").
For dialogue, use plain English in quotes — Veo's audio engine speaks naturally.
Camera moves: name them ("slow dolly forward", "crash zoom") — Veo respects them precisely.
Keep prompts under ~150 words — Veo handles dense scene descriptions but rewards focus.

Parameter tips

generate_audio: ON by default. Disable to save cost when you'll add audio in post.
Aspect ratio: 16:9 is its native canvas; 9:16 quality dips slightly.

Parameters

duration
string
The duration of the generated video.
4s6s8s
default: 8s
generate_audio
boolean
Whether to generate audio for the video.
default: true
aspect_ratio
string
The aspect ratio of the generated video.
16:99:16
default: 16:9
image_urls
array
URLs of the reference images to use for consistent subject appearance
resolution
string
The resolution of the generated video.
720p1080p4k
default: 720p
auto_fix
boolean
Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them.
default: false
safety_tolerance
string
The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict.
123456
default: 4

You'll need

A text prompt
Source images

Try now

More from veo3.1

Veo3.1 Fast Text To Video

veo3.1-fast-text-to-video

Veo3.1 Lite Text To Video

veo3.1-lite-text-to-video

Veo3.1 Text To Video

veo3.1-text-to-video

Veo3.1 Fast Image To Video

veo3.1-fast-image-to-video