Image → Videowan2.7wan2.7-reference-to-video
Wan2.7 Reference To Video
Alibaba WAN 2.7 Reference-to-Video. Reference characters/props to generate new shots.
Wan guideVerified
Alibaba's open-source video model line (Wan 2.1 → 2.7). Strong prompt adherence; the open-source pedigree means heavy community use and well-documented prompting patterns.
Strengths
- Best-in-class prompt adherence — does what you ask, not what it thinks you want.
- Wide variant family covers most needs (T2V, I2V, reference, motion control, lipsync).
- Wan 2.5 and 2.6 catch up to closed-source quality at lower cost.
- Wan 2.2 Spicy variants for adult creative work.
Weaknesses
- Older versions (2.1, 2.2) look dated next to current flagship.
- Stylization quality lags behind Kling and Hailuo.
Best for
- Precise prompt-driven scene construction
- Hybrid pipelines where Wan does the heavy lifting and another model polishes
Prompting tips
- Treat Wan like a brief — itemize what's in frame, the action, the camera.
- Wan does NOT need flowery language; plain descriptive prose works better.
Parameters
- num_framesintNumber of frames to generate. Must be between 81 to 241 (inclusive).default: 81range: 17 … 241
- num_interpolated_framesintNumber of frames to interpolate between the original frames. A value of 0 means no interpolation.default: 0range: 0 … 5
- num_inference_stepsintNumber of inference steps for sampling. Higher values give better quality but take longer.default: 30range: 2 … 50
- first_frame_urlstringURL to the first frame of the video. If provided, the model will use this frame as a reference.
- resolutionstringResolution of the generated video.auto240p360p480p580p720pdefault: auto
- frames_per_secondintFrames per second of the generated video. Must be between 5 to 30. Ignored if match_input_frames_per_second is true.default: 16
- last_frame_urlstringURL to the last frame of the video. If provided, the model will use this frame as a reference.
- match_input_frames_per_secondbooleanIf true, the frames per second of the generated video will match the input video. If false, the frames per second will be determined by the frames_per_second parameter.default: true
- video_urlstringURL to the source video file. This video will be used as a reference for the reframe task.
- guidance_scalenumberGuidance scale for classifier-free guidance. Higher values encourage the model to generate images closely related to the text prompt.default: 5range: 1 … 10
- shiftnumberShift parameter for video generation.default: 5range: 1 … 15
- video_write_modestringThe write mode of the generated video.fastbalancedsmalldefault: balanced
- temporal_downsample_factorintTemporal downsample factor for the video. This is an integer value that determines how many frames to skip in the video. A value of 0 means no downsampling. For each downsample factor, one upsample factor will automatically be applied.default: 0range: 0 … 5
- transparency_modestringThe transparency mode to apply to the first and last frames. This controls how the transparent areas of the first and last frames are filled.content_awarewhiteblackdefault: content_aware
- auto_downsample_min_fpsnumberThe minimum frames per second to downsample the video to. This is used to help determine the auto downsample factor to try and find the lowest detail-preserving downsample factor. The default value is appropriate for most videos, if you are using a video with very fast motion,…default: 15range: 1 … 60
- zoom_factornumberZoom factor for the video. When this value is greater than 0, the video will be zoomed in by this factor (in relation to the canvas size,) cutting off the edges of the video. A value of 0 means no zoom.default: 0range: 0 … 0.9
- negative_promptstringNegative prompt for video generation.default: letterboxing, borders, black bars, bright colors, overexposed, static, blurred details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers, still picture, cluttered background, three legs, many people in the background, walking backwards
- samplerstringSampler to use for video generation.unipcdpm++eulerdefault: unipc
- interpolator_modelstringThe model to use for frame interpolation. Options are 'rife' or 'film'.rifefilmdefault: film
- accelerationstringAcceleration to use for inference. Options are 'none' or 'regular'. Accelerated inference will very slightly affect output, but will be significantly faster.default: regular
- match_input_num_framesbooleanIf true, the number of frames in the generated video will match the number of frames in the input video. If false, the number of frames will be determined by the num_frames parameter.default: true
- enable_prompt_expansionbooleanWhether to enable prompt expansion.default: false
- return_frames_zipbooleanIf true, also return a ZIP file containing all generated frames.default: false
- seedintRandom seed for reproducibility. If None, a random seed is chosen.
- trim_bordersbooleanWhether to trim borders from the video.default: true
- aspect_ratiostringAspect ratio of the generated video.auto16:91:19:16default: auto
- video_qualitystringThe quality of the generated video.lowmediumhighmaximumdefault: high
- enable_auto_downsamplebooleanIf true, the model will automatically temporally downsample the video to an appropriate frame length for the model, then will interpolate it back to the original frame length.default: false