Wan 2.7 First and Last Frame Control: How to Generate Predictable AI Video
How to use Wan 2.7 first and last frame control for predictable video generation — frame pairing strategies, what the model infers, and how to avoid the most common failure modes.

Standard image-to-video gives you control over the opening frame and then the model improvises. The result — what most workflows call drift — is a clip that gradually diverges from any endpoint you had in mind. For product demos, narrative transitions, or anything with a required landing point, drift is expensive to fix in post.
Wan 2.7's first and last frame control solves this directly. You define both the opening and closing frame. The model generates the motion in between.
This guide is for getting repeatable results out of that capability — not just a demo that worked once.

What First and Last Frame Control Actually Does
The Problem It Solves vs Standard I2V
Standard image-to-video anchors the opening frame and lets the model decide everything that follows. Subject position, camera movement, lighting — all improvised. When the model's improvisation diverges from your intent, you re-roll and hope.
Wan 2.7's first/last frame approach uses both images as control conditions. Semantic features from both frames are injected into the generation process, which keeps style, content, and structure consistent while the model generates the motion path between them. The result is a clip with a defined start and a defined destination — not just a defined start with a random landing.
How Both Frames Are Used During Generation
The model does not interpolate pixel values between frames. It uses semantic features and cross-attention mechanisms to keep the video stable — this approach reduces jitter compared to single-anchor generation because the model is constrained on both ends, not just one.
Your first frame defines the initial state. Your last frame constrains the destination. The motion path between them is inferred from both the frame content and your text prompt.
What the Model Infers About the Path
Your text prompt shapes how the transition happens — not just that it happens. If your prompt says "the camera slowly pulls back to reveal the full skyline," that motion description shapes the inferred path. Without a prompt, the model will still attempt a plausible transition, but you will have far less control over camera direction, pacing, and subject movement.
The prompt is not optional. It is the mechanism through which you direct the motion.
Input Preparation
What Makes a Good Frame Pair
The strongest pairs share three things: consistent light source direction, matching depth of field, and a subject that is spatially plausible in both positions.
A product shot in diffuse studio light paired with an end frame showing the same product at a slightly different angle works well. An establishing wide shot transitioning to a medium close-up works if the camera move is part of your prompt intention.
Think of the pair as defining a verb: open → closed, before → after, empty → filled, standing → seated. The cleaner the semantic relationship between the two frames, the more coherent the inferred path.

Image Spec Requirements
Use PNG or high-quality JPEG for both frames. Avoid compressed thumbnails — compression artifacts introduce noise that the model interprets as intentional visual information, which degrades the output.
Resolution: match your first frame's aspect ratio as closely as possible to your intended output. For the Wan series, 720p (1280 × 720 or portrait equivalent) is the recommended resolution for quality output. Smaller resolutions are fine for test iterations but not finals.
What Makes a Bad Frame Pair
Inconsistent lighting direction. If your first frame has a key light at 45° left and your last frame was lit from overhead, the model will attempt to transition between two different shadow environments. The result is usually a mid-clip lighting jump that looks like a render error — not an intentional change.
Spatial mismatch without intent. A wide establishing shot paired with an extreme close-up forces the model to invent a camera move. Sometimes that is exactly what you want. Usually it is not. Keep focal distance roughly consistent unless you are explicitly prompting for a zoom or pull.
Conflicting depth cues. Heavy bokeh in the first frame and everything in sharp focus in the last — the model will interpret this as an intentional depth-of-field change and try to animate it. If that is not your intention, match the depth treatment between frames.
Subject position that defies physics. A character standing on the left side of frame paired with a last frame where they are on the right side with nothing in the prompt to explain the movement — the model will generate an awkward cross that looks unintentional. Make the spatial logic of the transition explicit in your prompt.
How to Use First/Last Frame in Wan 2.7
Go to wan27.org and open the Wan 2.7 video generation tool. Select the first/last frame mode.
Step 1: Prepare your frames
Choose or generate your opening and closing frames. They do not need to be photographed — AI-generated images work well as frame inputs, which lets you design both endpoints before committing to a generation.
Step 2: Upload both frames
Upload your first frame and last frame. The model will read both as control conditions.
Step 3: Write your motion prompt
Describe the transition — what moves, how it moves, camera behavior, pacing, and any environmental changes. Be specific. "The camera slowly tracks right while the subject walks toward the window" gives the model more to work with than "the person moves."
Step 4: Generate and evaluate
Review the output against both frames and your prompt intention. If the motion path is wrong, adjust the prompt first — most failures at this stage are under-specified prompts, not frame pair problems.
Step 5: Iterate on frame pairs if needed
If the prompt is detailed and the output still drifts, revisit the frame pair. The most common culprit is a spatial or lighting mismatch between frames that forces the model to do work you did not intend.
Common Use Cases
Product Transitions
Define a product in one position or configuration as the first frame, and a different position or reveal as the last frame. Prompt the rotation or reveal motion explicitly. The consistent studio lighting constraint handles most of the consistency work.
Narrative Scene Transitions
Define the emotional and spatial start state of a scene and the end state. Use the prompt to describe what happens between them — not just what the frames look like, but what causes the change.
Storyboard-to-Shot Conversion
Use frame pairs drawn directly from a storyboard to generate animatic-quality clips. The first/last frame approach is particularly suited to storyboard work because storyboards already define start and end states shot by shot.
Campaign and Social Content
Define a brand asset or product in the opening frame, define the desired composition or angle in the last frame, and use the prompt to describe the move. Repeatable results across a campaign asset set.
What the Model Cannot Do
First/last frame control is powerful but not omniscient:
- It cannot teleport. If your first and last frames are incompatible positions for the subject — standing in one room vs. sitting in a completely different room with no transition logic — the model will produce something, but it will look like a cut, not a transition.
- It cannot override physics. A falling object in frame one cannot be mid-air in frame four without a plausible arc between them.
- It is not a camera control system. You direct camera behavior through your prompt, not by encoding camera metadata in the frames themselves.
Use it to constrain the destination. Use your prompt to direct the journey.
Try first and last frame video generation at wan27.org.
Author
More Posts

Wan 2.7 Text-to-Image: Generate High-Quality AI Images With Thinking Mode
Wan 2.7 Text-to-Image generates high-quality images from text prompts using a built-in thinking mode for better composition, superior text rendering, hex color control, and flexible aspect ratios. Generate directly at wan27.org.

Wan 2.7 Text-to-Image Pro: Up to 4K AI Image Generation With Thinking Mode
Wan 2.7 Text-to-Image Pro generates images up to 4K resolution from text prompts with thinking mode, superior text rendering, and magazine-cover quality. Generate directly at wan27.org.

Wan 2.7 vs Wan 2.6: Every Upgrade That Actually Matters
A complete comparison of Wan 2.7 vs Wan 2.6 — first/last frame control, 9-grid image-to-video, instruction editing, video recreation, and the new Wan 2.7 Image model. What changed, what stayed, and whether the upgrade is worth it.
Newsletter
Join the community
Subscribe to our newsletter for the latest news and updates