Wan 2.2. Text, Image, Speech, and Template Animation in One Family.

Wan 2.2 covers five practical workflows in this project: text to video, image to video, speech to video, animate move, and animate replace. It is useful when teams need short, task-specific generation paths instead of one general-purpose mode.

Prompt (162/10000)

Wan 2.2 Animate demo

Preview an Animate-style motion result before you generate.

5 Modes
Workflow Options
720p
Max Output
Speech+Animate
Special Modes
Turbo
Fast Workflows
What Is Wan 2.2

Wan 2.2 — A Task-Specific Video Model Family.

Wan 2.2 is the most mode-diverse Wan family in this project. It supports prompt-led clips, image-led clips, speech-driven portrait video, and two template animation workflows for moving or replacing content with source media.

That makes Wan 2.2 useful when the job is narrow and the inputs are clear. Instead of one broad workflow, it offers several smaller ones with specific media requirements and output profiles.

Wan 2.2 text to video cinematic neon street generation

Text and Image Video Modes

Use prompt-only or image-led generation depending on how the idea starts.

Speech to Video

Combine a portrait image and speech audio to generate talking-style video.

Animate Move and Replace

Use source video plus image inputs for template-like motion or replacement workflows.

480p, 580p, and 720p

Choose among three output resolutions across the Wan 2.2 family.

How It Works

Wan 2.2 in Three Practical Steps.

Pick the workflow, add the required media, then render the task-specific output.

01

Choose the Right Wan 2.2 Mode

Use text to video for prompt-only clips, image to video for still-to-motion work, speech to video for portrait + audio generation, and animate move or replace for template-style source-media workflows.

Pick the mode from the input requirement, not from the model name.

02

Prepare the Required Inputs

Some Wan 2.2 modes need only a prompt, while others require an image, audio, or source video. The workflow is clearer when you decide inputs first and prompt second.

Speech and animate modes work best when source media is already clean and intentional.

03

Generate in the Right Resolution and Duration

Wan 2.2 uses practical output profiles across 480p, 580p, and 720p. Some modes stay short and fixed, while others offer longer talking-style generation.

Treat Wan 2.2 as a toolset of smaller workflows, not one generic model.

Wan 2.2 Features

Why Teams Use Wan 2.2?

A broader set of task-specific workflows than the newer but simpler Wan families.

Wan 2.2 text to video cinematic neon street generation

Text to Video Turbo

Wan 2.2 text to video is useful when you need a short prompt-led clip without additional media setup.

Prompt-first, short-form generation.

Wan 2.2 image to video product motion workflow

Image to Video Turbo

Wan 2.2 image to video uses a source image as the starting frame and turns it into a short animated clip.

Start from a still and move forward.

Speech to Video

Wan 2.2 speech to video combines a portrait image and an audio track to generate talking-style motion with a clearer input structure than generic prompt-only video.

Image plus audio, built for talking clips.

Wan 2.2 animate move and replace source media workflow

Animate Move

Animate move uses a source video plus an image input for template-like motion workflows where existing movement matters.

Reuse source motion more directly.

Animate Replace

Animate replace uses source video plus image input for replacement-oriented motion tasks when you need a different visual element in the same movement pattern.

Swap the visual while keeping the motion pattern.

480p, 580p, and 720p Output

Wan 2.2 gives teams three practical resolution choices, which is useful when the job is more about fast workflow matching than maximum cinematic output.

Three resolutions, lighter tradeoffs.

Task-Specific Instead of One-Size-Fits-All

The value of Wan 2.2 is the breadth of specific workflows. Each mode solves a narrower job more directly than a general-purpose prompt-only path.

More modes, clearer jobs.

Good Fit for Operational Video Work

Wan 2.2 is useful for teams that handle many small content tasks, especially when speech, short animation, or template-like transformation flows show up often.

Practical workflows for day-to-day content ops.

Use Cases

Wan 2.2 for Task-Specific Video Jobs.

Use different Wan 2.2 modes for short clips, talking portraits, and source-media animation tasks.

Fast Drafting

Generate Short Prompt-Led Motion Tests

Use Wan 2.2 text or image video modes to test simple shot ideas and motion beats quickly.

Talking Content

Create Portrait-Led Clips from Audio and Image

Speech to video is useful for lightweight talking-head outputs, explainers, and creator-style content.

Template Animation

Build Quick Replacement and Motion Variants

Animate move and animate replace help when teams need template-like motion behavior around existing source video patterns.

Concept Loops

Turn Character Stills into Short Motion Samples

Image to video can help test short loops or promo fragments from concept art or keyframes.

Content Ops

Match the Workflow to the Input Type

Wan 2.2 is useful when the operator needs different short-form tools depending on whether the job starts from prompt, image, speech, or source media.

Explainers

Create Short Teaching Clips with Simple Inputs

Use speech and image-based modes for practical educational content where the media requirements are already clear.

What Teams Say

Why Operators Keep Wan 2.2 Around.

"Wan 2.2 is not about one hero workflow. It is useful because it has a mode for the small jobs that show up every day."
Ava Chen
Creative Operations Lead
"Speech to video is what keeps Wan 2.2 relevant for us. The input structure is clear enough that junior operators can use it without much confusion."
Marco Ruiz
Studio Manager
"The animate modes are niche, but when you need them, they are exactly the right level of specificity."
Leah Stone
Template Motion Specialist
"Text to video and image to video are still good for fast tests, but the extra Wan 2.2 modes are why it stays in our toolbox."
Daniel Ng
AI Video Producer
"480p, 580p, and 720p are enough for the kinds of short operational clips we use Wan 2.2 for."
Yuna Watanabe
Education Content Lead
"Wan 2.2 is for people who think in workflows and inputs, not just in model version numbers."
Hiro Kato
Video Automation Operator

Start Creating with Wan 2.2

Use text, image, speech, and template animation workflows from one broad Wan family.

No credit card required. Free generations included. Multiple input-driven modes available.

Wan 2.2 FAQ

Wan 2.2 — Frequently Asked Questions.

What is Wan 2.2?

Wan 2.2 is a multi-mode video model family in this project that supports text to video, image to video, speech to video, animate move, and animate replace workflows.

How does Wan 2.2 text to video work?

Wan 2.2 text to video generates a short clip from prompt alone and is best when the idea starts in words rather than in existing media.

How does Wan 2.2 image to video work?

Wan 2.2 image to video starts from a source image and animates it into a short clip based on the prompt.

What is Wan 2.2 speech to video?

Wan 2.2 speech to video uses a portrait image and an audio file to generate a talking-style video workflow.

What is Wan 2.2 animate move?

Animate move is a Wan 2.2 workflow that uses source video and image inputs for template-style motion tasks where source movement should stay central.

What is Wan 2.2 animate replace?

Animate replace is a Wan 2.2 workflow that uses source video and image inputs for replacement-oriented animation tasks.

What resolutions does Wan 2.2 support?

Wan 2.2 modes in this project support 480p, 580p, and 720p output profiles depending on the workflow.

Who should use Wan 2.2?

Wan 2.2 is a strong fit for teams that need several task-specific short-form workflows, especially around speech-driven clips and source-media animation.