2026/04/06

Wan 2.7 First and Last Frame Control: How to Generate Predictable AI Video

How to use Wan 2.7 first and last frame control for predictable video generation — frame pairing strategies, what the model infers, and how to avoid the most common failure modes.

Wan 2.7 First and Last Frame Control: How to Generate Predictable AI Video

Standard image-to-video gives you control over the opening frame and then the model improvises. The result — what most workflows call drift — is a clip that gradually diverges from any endpoint you had in mind. For product demos, narrative transitions, or anything with a required landing point, drift is expensive to fix in post.

Wan 2.7's first and last frame control solves this directly. You define both the opening and closing frame. The model generates the motion in between.

This guide is for getting repeatable results out of that capability — not just a demo that worked once.

Wan 2.7 first and last frame control: two cinematic frames connected by motion — opening shot at a mountain trailhead, closing shot at the summit at sunset

What First and Last Frame Control Actually Does

The Problem It Solves vs Standard I2V

Standard image-to-video anchors the opening frame and lets the model decide everything that follows. Subject position, camera movement, lighting — all improvised. When the model's improvisation diverges from your intent, you re-roll and hope.

Wan 2.7's first/last frame approach uses both images as control conditions. Semantic features from both frames are injected into the generation process, which keeps style, content, and structure consistent while the model generates the motion path between them. The result is a clip with a defined start and a defined destination — not just a defined start with a random landing.

How Both Frames Are Used During Generation

The model does not interpolate pixel values between frames. It uses semantic features and cross-attention mechanisms to keep the video stable — this approach reduces jitter compared to single-anchor generation because the model is constrained on both ends, not just one.

Your first frame defines the initial state. Your last frame constrains the destination. The motion path between them is inferred from both the frame content and your text prompt.

What the Model Infers About the Path

Your text prompt shapes how the transition happens — not just that it happens. If your prompt says "the camera slowly pulls back to reveal the full skyline," that motion description shapes the inferred path. Without a prompt, the model will still attempt a plausible transition, but you will have far less control over camera direction, pacing, and subject movement.

The prompt is not optional. It is the mechanism through which you direct the motion.

Input Preparation

What Makes a Good Frame Pair

The strongest pairs share three things: consistent light source direction, matching depth of field, and a subject that is spatially plausible in both positions.

A product shot in diffuse studio light paired with an end frame showing the same product at a slightly different angle works well. An establishing wide shot transitioning to a medium close-up works if the camera move is part of your prompt intention.

Think of the pair as defining a verb: open → closed, before → after, empty → filled, standing → seated. The cleaner the semantic relationship between the two frames, the more coherent the inferred path.

Wan 2.7 first/last frame storyboard: four-panel production storyboard showing smooth character motion from desk to window, with lighting references and motion arrows

Image Spec Requirements

Use PNG or high-quality JPEG for both frames. Avoid compressed thumbnails — compression artifacts introduce noise that the model interprets as intentional visual information, which degrades the output.

Resolution: match your first frame's aspect ratio as closely as possible to your intended output. For the Wan series, 720p (1280 × 720 or portrait equivalent) is the recommended resolution for quality output. Smaller resolutions are fine for test iterations but not finals.

What Makes a Bad Frame Pair

Inconsistent lighting direction. If your first frame has a key light at 45° left and your last frame was lit from overhead, the model will attempt to transition between two different shadow environments. The result is usually a mid-clip lighting jump that looks like a render error — not an intentional change.

Spatial mismatch without intent. A wide establishing shot paired with an extreme close-up forces the model to invent a camera move. Sometimes that is exactly what you want. Usually it is not. Keep focal distance roughly consistent unless you are explicitly prompting for a zoom or pull.

Conflicting depth cues. Heavy bokeh in the first frame and everything in sharp focus in the last — the model will interpret this as an intentional depth-of-field change and try to animate it. If that is not your intention, match the depth treatment between frames.

Subject position that defies physics. A character standing on the left side of frame paired with a last frame where they are on the right side with nothing in the prompt to explain the movement — the model will generate an awkward cross that looks unintentional. Make the spatial logic of the transition explicit in your prompt.

How to Use First/Last Frame in Wan 2.7

Go to wan27.org and open the Wan 2.7 video generation tool. Select the first/last frame mode.

Step 1: Prepare your frames

Choose or generate your opening and closing frames. They do not need to be photographed — AI-generated images work well as frame inputs, which lets you design both endpoints before committing to a generation.

Step 2: Upload both frames

Upload your first frame and last frame. The model will read both as control conditions.

Step 3: Write your motion prompt

Describe the transition — what moves, how it moves, camera behavior, pacing, and any environmental changes. Be specific. "The camera slowly tracks right while the subject walks toward the window" gives the model more to work with than "the person moves."

Step 4: Generate and evaluate

Review the output against both frames and your prompt intention. If the motion path is wrong, adjust the prompt first — most failures at this stage are under-specified prompts, not frame pair problems.

Step 5: Iterate on frame pairs if needed

If the prompt is detailed and the output still drifts, revisit the frame pair. The most common culprit is a spatial or lighting mismatch between frames that forces the model to do work you did not intend.

Common Use Cases

Product Transitions

Define a product in one position or configuration as the first frame, and a different position or reveal as the last frame. Prompt the rotation or reveal motion explicitly. The consistent studio lighting constraint handles most of the consistency work.

Narrative Scene Transitions

Define the emotional and spatial start state of a scene and the end state. Use the prompt to describe what happens between them — not just what the frames look like, but what causes the change.

Storyboard-to-Shot Conversion

Use frame pairs drawn directly from a storyboard to generate animatic-quality clips. The first/last frame approach is particularly suited to storyboard work because storyboards already define start and end states shot by shot.

Campaign and Social Content

Define a brand asset or product in the opening frame, define the desired composition or angle in the last frame, and use the prompt to describe the move. Repeatable results across a campaign asset set.

What the Model Cannot Do

First/last frame control is powerful but not omniscient:

  • It cannot teleport. If your first and last frames are incompatible positions for the subject — standing in one room vs. sitting in a completely different room with no transition logic — the model will produce something, but it will look like a cut, not a transition.
  • It cannot override physics. A falling object in frame one cannot be mid-air in frame four without a plausible arc between them.
  • It is not a camera control system. You direct camera behavior through your prompt, not by encoding camera metadata in the frames themselves.

Use it to constrain the destination. Use your prompt to direct the journey.


Try first and last frame video generation at wan27.org.

Newsletter

Join the community

Subscribe to our newsletter for the latest news and updates