Image → Videowan2.7wan2.7-reference-to-video

Wan2.7 Reference To Video

Alibaba WAN 2.7 Reference-to-Video. Reference characters/props to generate new shots.

Open in Video Lab Browse all models~74 credits per run

Wan guideVerified

Alibaba's open-source video model line (Wan 2.1 → 2.7). Strong prompt adherence; the open-source pedigree means heavy community use and well-documented prompting patterns.

Strengths

Best-in-class prompt adherence — does what you ask, not what it thinks you want.
Wide variant family covers most needs (T2V, I2V, reference, motion control, lipsync).
Wan 2.5 and 2.6 catch up to closed-source quality at lower cost.
Wan 2.2 Spicy variants for adult creative work.

Weaknesses

Older versions (2.1, 2.2) look dated next to current flagship.
Stylization quality lags behind Kling and Hailuo.

Best for

Precise prompt-driven scene construction
Hybrid pipelines where Wan does the heavy lifting and another model polishes

Prompting tips

Treat Wan like a brief — itemize what's in frame, the action, the camera.
Wan does NOT need flowery language; plain descriptive prose works better.

Parameters

num_frames
int
Number of frames to generate. Must be between 81 to 241 (inclusive).
default: 81
range: 17 … 241
num_interpolated_frames
int
Number of frames to interpolate between the original frames. A value of 0 means no interpolation.
default: 0
range: 0 … 5
num_inference_steps
int
Number of inference steps for sampling. Higher values give better quality but take longer.
default: 30
range: 2 … 50
first_frame_url
string
URL to the first frame of the video. If provided, the model will use this frame as a reference.
resolution
string
Resolution of the generated video.
auto240p360p480p580p720p
default: auto
frames_per_second
int
Frames per second of the generated video. Must be between 5 to 30. Ignored if match_input_frames_per_second is true.
default: 16
last_frame_url
string
URL to the last frame of the video. If provided, the model will use this frame as a reference.
match_input_frames_per_second
boolean
If true, the frames per second of the generated video will match the input video. If false, the frames per second will be determined by the frames_per_second parameter.
default: true
video_url
string
URL to the source video file. This video will be used as a reference for the reframe task.
guidance_scale
number
Guidance scale for classifier-free guidance. Higher values encourage the model to generate images closely related to the text prompt.
default: 5
range: 1 … 10
shift
number
Shift parameter for video generation.
default: 5
range: 1 … 15
video_write_mode
string
The write mode of the generated video.
fastbalancedsmall
default: balanced
temporal_downsample_factor
int
Temporal downsample factor for the video. This is an integer value that determines how many frames to skip in the video. A value of 0 means no downsampling. For each downsample factor, one upsample factor will automatically be applied.
default: 0
range: 0 … 5
transparency_mode
string
The transparency mode to apply to the first and last frames. This controls how the transparent areas of the first and last frames are filled.
content_awarewhiteblack
default: content_aware
auto_downsample_min_fps
number
The minimum frames per second to downsample the video to. This is used to help determine the auto downsample factor to try and find the lowest detail-preserving downsample factor. The default value is appropriate for most videos, if you are using a video with very fast motion,…
default: 15
range: 1 … 60
zoom_factor
number
Zoom factor for the video. When this value is greater than 0, the video will be zoomed in by this factor (in relation to the canvas size,) cutting off the edges of the video. A value of 0 means no zoom.
default: 0
range: 0 … 0.9
negative_prompt
string
Negative prompt for video generation.
default: letterboxing, borders, black bars, bright colors, overexposed, static, blurred details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers, still picture, cluttered background, three legs, many people in the background, walking backwards
sampler
string
Sampler to use for video generation.
unipcdpm++euler
default: unipc
interpolator_model
string
The model to use for frame interpolation. Options are 'rife' or 'film'.
rifefilm
default: film
acceleration
string
Acceleration to use for inference. Options are 'none' or 'regular'. Accelerated inference will very slightly affect output, but will be significantly faster.
default: regular
match_input_num_frames
boolean
If true, the number of frames in the generated video will match the number of frames in the input video. If false, the number of frames will be determined by the num_frames parameter.
default: true
enable_prompt_expansion
boolean
Whether to enable prompt expansion.
default: false
return_frames_zip
boolean
If true, also return a ZIP file containing all generated frames.
default: false
seed
int
Random seed for reproducibility. If None, a random seed is chosen.
trim_borders
boolean
Whether to trim borders from the video.
default: true
aspect_ratio
string
Aspect ratio of the generated video.
auto16:91:19:16
default: auto
video_quality
string
The quality of the generated video.
lowmediumhighmaximum
default: high
enable_auto_downsample
boolean
If true, the model will automatically temporally downsample the video to an appropriate frame length for the model, then will interpolate it back to the original frame length.
default: false

You'll need

A text prompt
Start frame
End frameoptional
Reference video

Try now

More from wan2.7

Wan2.7 Text To Image

wan2.7-text-to-image

Wan2.7 Text To Image Pro

wan2.7-text-to-image-pro

Wan2.7 Image Edit

wan2.7-image-edit

Wan2.7 Image Edit Pro

wan2.7-image-edit-pro

Wan2.7 Text To Video

wan2.7-text-to-video

Wan2.7 Image To Video

wan2.7-image-to-video