Text-to-Video Generation
Generates video clips from written text prompts, supporting both English and Chinese input natively. The 14B parameter variant targets higher visual fidelity while the 5B variant is optimized for consumer hardware.
Wan 2.2 is a multimodal video generation model developed by Alibaba's Tongyi Laboratory and released in July 2025 under the Apache 2.0 license. It is the first video diffusion model to apply a Mixture-of-Experts (MoE) architecture, which splits processing between high-noise expert networks that handle overall layout and composition and low-noise expert networks that refine fine details. The model supports both text-to-video and image-to-video generation, with native bilingual prompting in English and Chinese. It is available in a 5B parameter variant suited for consumer hardware and a 14B parameter variant for higher-quality output. Wan 2.2 was trained on a dataset expanded significantly from its predecessor, with image data increasing by 65.6% and video data by 83.2%. It includes a dedicated aesthetic fine-tuning stage informed by film industry standards, further refined through reinforcement learning to align with human visual preferences. Specialized modules — Wan-Animate and Wan-Move — allow users to animate a character from a single image or transfer motion from one video to another subject. The model is natively supported by ComfyUI and accepts LoRA adapters and source images as inputs alongside text prompts.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Wan 2.2.
Wan 2.2 is a multimodal video generation model developed by Alibaba's Tongyi Laboratory and released in July 2025 under the Apache 2.0 license. It is the first video diffusion model to apply a Mixture-of-Experts (MoE) architecture, which splits processing between high-noise expert networks that handle overall layout and composition and low-noise expert networks that refine fine details. The model supports both text-to-video and image-to-video generation, with native bilingual prompting in English and Chinese. It is available in a 5B parameter variant suited for consumer hardware and a 14B parameter variant for higher-quality output.
Wan 2.2 was trained on a dataset expanded significantly from its predecessor, with image data increasing by 65.6% and video data by 83.2%. It includes a dedicated aesthetic fine-tuning stage informed by film industry standards, further refined through reinforcement learning to align with human visual preferences. Specialized modules — Wan-Animate and Wan-Move — allow users to animate a character from a single image or transfer motion from one video to another subject. The model is natively supported by ComfyUI and accepts LoRA adapters and source images as inputs alongside text prompts.
Generates video clips from written text prompts, supporting both English and Chinese input natively. The 14B parameter variant targets higher visual fidelity while the 5B variant is optimized for consumer hardware.
Animates a static reference image into a dynamic video clip using the I2V pipeline. Accepts an image URL as input alongside a text prompt to guide motion and style.
Accepts LoRA adapter weights to customize the model's visual style or subject matter without full retraining. LoRA inputs are specified directly in the generation request.
The Wan-Animate module animates a character from a single source image, producing a video with natural motion from a still photo.
The Wan-Move module transfers motion patterns from one video onto a different subject, enabling pose and movement replication across subjects.
Provides control over lighting, color grading, lens composition, and camera movement through text prompts. Aesthetic fine-tuning was informed by film industry standards and refined with reinforcement learning.
Accepts a seed value as an input parameter, allowing users to reproduce identical outputs or systematically explore variations from a fixed starting point.
Uses a Mixture-of-Experts architecture that routes work between high-noise experts for layout and low-noise experts for detail refinement within a single diffusion model.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Up to 3 LoRAs.
Description of what to exclude from the video.
A specific value that is used to guide the 'randomness' of the generation.
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Wan 2.2 discussions are most active in r/StableDiffusion, r/comfyui, r/grok. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.
The strongest match in this snapshot has 6073 upvotes and 177 comments.
I have been using Wan2.2 (12V) 14B on huggingface for a while now to do NSFW image to video generations and it always worked great. But for the last couple of days I keep getting ' Generation blocked by guardrails: The resulting video may contain explicit content.'.
Does Wan 2.2 14B no longer support NSFW ? It still shows up in huggingface if you type ' NSFW image to video' in the search bar, but is not allowing NSFW image to video.
Any help or insight into this would be really appreciated. Thanks!
Edit 1: I understand that the space on huggingface no longer allows nsfw generation and the model itself has not changed, so the question now becomes : what other alternatives are out there ? I am mostly looking for spaces on huggingface or platforms similar to huggingface which requires no prior set up. Running it locally for me takes too long for the workflows that I have.
Edit 2 (Fixed): Turns out they added a checkbox for 'Enable Safety Filter' in the advanced settings, with it being always turned on by default. Just had to flip the switch and voila! Huge thanks to @VisibleExchange7528 for pointing this out !!!
LTX-2.3 are the only posts that exists now. Is it over for wan 2.2?
Seems pretty effective.
Her outfit is inconsistent, but I used a reference image that only included the upper half of her body and head, so that is to be expected.
I should say, these clips are from the film "The Ninth Gate", which is excellent. :)
Wan 2.2 has a context window of 1,000 tokens, which governs the length and complexity of text prompts it can process in a single generation request.
Wan 2.2 is available in two sizes: a 5B parameter version designed for efficient use on consumer hardware and a 14B parameter version intended for higher-quality output. Both are available on Hugging Face under the Apache 2.0 license.
Yes. Wan 2.2 is released under the Apache 2.0 license, which permits free commercial use. The model weights are publicly available on Hugging Face.
Wan 2.2 accepts text prompts, image URLs (for image-to-video generation), LoRA adapter weights, configurable select options, and a seed value for reproducibility.
Wan 2.2 was released in July 2025 by Alibaba's Tongyi Laboratory. Its training data includes an image dataset 65.6% larger and a video dataset 83.2% larger than those used for its predecessor, Wan 2.1.
Yes. Wan 2.2 has native support in ComfyUI. Official tutorials and workflow documentation are available at docs.comfy.org.
Continue browsing adjacent models from the same provider.