Wan

Wan 2.5

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement. Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.

Unknown 2,000 context N/A output
Text-to-Video Image-to-Video Synchronized Audio Generation Cinematic Camera Controls Multilingual Prompt Input Seed Control

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Wan

Input Context Window

The number of tokens supported by the input context window.

2,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

Hugging Face

Modalities

Types of data this model can process.

Video Text Image Audio

What is Wan 2.5

A fuller summary of positioning, capabilities, and source-specific details for Wan 2.5.

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.

Capabilities

What Wan 2.5 supports

VID

Text-to-Video

Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.

IMG

Image-to-Video

Animates a source image into a video clip, using the provided image URL as the visual starting point for generation.

AUD

Synchronized Audio Generation

Produces dialogue with lip-sync, ambient environmental sounds, and background music in a single generation step alongside the video.

AI

Cinematic Camera Controls

Supports named camera movements including dolly, tracking, and crane shots, as well as depth of field and color grading settings specified in the prompt.

AI

Multilingual Prompt Input

Accepts prompts in at least 8 languages and generates matching audio output in the corresponding language.

AI

Seed Control

Accepts a seed value as an input parameter, allowing reproducible generation results for a given prompt and settings combination.

AI

Style Flexibility

Handles photorealistic, anime, illustrated, and other stylized visual aesthetics based on prompt instructions.

VID

Video-to-Video Refinement

Accepts an existing video as input and applies prompt-guided modifications or style changes to produce a refined output.

Pricing for Wan 2.5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Hugging Face

Configuration & Parameters

The configurable options currently documented for this model.

Resolution

Select
Default: 720p
1080p 720p 480p

Duration

Select
Default: 5
5 seconds 8 seconds

Negative Prompt

Text

Description of what to exclude from the video.

Seed

Seed

A specific value that is used to guide the 'randomness' of the generation.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Resolution Duration Negative Prompt Seed

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Wan 2.5

Wan 2.5 discussions are most active in r/StableDiffusion, r/HiggsfieldAI, r/comfyui. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.

The strongest match in this snapshot has 289 upvotes and 132 comments.

r/HiggsfieldAI 9 upvotes 51 comments December 17, 2025
Is WAN 2.6 still uncensored on Higgsfield?

WAN 2.5 was uncensored on higgsfield, can anyone tell me if WAN 2.6 is still uncensored? Higgsfield no longer gives enough credits to run off even ONE test video.

The "official" WAN 2.5/2.6 page is heavily censored making it useless. Try to describe a G rated scene where people are sunbathing by a pool and it will probably be blocked.

Open Reddit thread
r/StableDiffusion 10 upvotes 18 comments December 16, 2025
Are There Any Open-Source Video Models Comparable to Wan 2.5/2.6?

With the release of Wan 2.5/2.6 still uncertain in terms of open-source availability, I’m wondering if there are any locally runnable video generation models that come close to its quality. Ideally looking for something that can be downloaded and run offline (or self-hosted), even if it requires beefy hardware. Any recommendations or comparisons would be appreciated.

Open Reddit thread
r/StableDiffusion 236 upvotes 219 comments September 23, 2025
Wan 2.5

[https://x.com/Ali\_TongyiLab/status/1970401571470029070](https://x.com/Ali_TongyiLab/status/1970401571470029070)

Just incase you didn't free up some space, be ready .. for 10 sec 1080p generations.

EDIT NEW LINK : [https://x.com/Alibaba\_Wan/status/1970419930811265129](https://x.com/Alibaba_Wan/status/1970419930811265129)

Open Reddit thread
r/StableDiffusion 289 upvotes 132 comments September 23, 2025
Ask nicely for Wan 2.5 to be open source

Sounds like they will eventually release it but maybe if enough people ask it will happen sooner than later.

>I'll say it first, so as not to be scolded,.. The 2.5 sent tomorrow is the advance version. For the time being, there is only the API version. For the time being, the open source version is to be determined. It is recommended that the community call for follow-up open source and rational comments, lest it be inappropriate to curse in the live broadcast room tomorrow. Everyone manages the expectations. It is recommended to ask for open source directly in the live broadcast room tomorrow! But rational comments, I think it will be opened in general, but there is a time difference, which mainly depends on the attitude of the community. After all, WAN mainly depends on the community, and the volume of voice is still very important.

>Sep 23, 2025 · 9:25 AM UTC

https://preview.redd.it/pv9opbtv0wqf1.png?width=526&format=png&auto=webp&s=a707e0b44d4833393be66f6d09194a275bb7d279

Open Reddit thread
View more discussions →
FAQ

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which governs the length and detail of the text prompt it can process for a single generation request.

What video resolutions and durations does Wan 2.5 support?

Wan 2.5 generates videos at 480p, 720p, or 1080p HD resolutions, with native 4K available in preview. Videos can be up to 10 seconds long at 24 frames per second.

Does Wan 2.5 generate audio automatically, or does it require a separate step?

Audio generation is native and simultaneous — dialogue with lip-sync, ambient sounds, and background music are all produced in a single generation step alongside the video, with no separate post-production required.

What input types does Wan 2.5 accept?

Wan 2.5 accepts text prompts, image URLs (for image-to-video), audio inputs, select parameters for configuration options, and a seed value for reproducible outputs.

Is Wan 2.5 open source, and when was it trained?

Yes, Wan 2.5 is open source and was developed by Alibaba's DAMO Academy. Its training data has a cutoff of September 2025.

What languages does Wan 2.5 support for prompts?

Wan 2.5 processes prompts in at least 8 languages and generates audio output that matches the language used in the prompt.

More models from Wan

Continue browsing adjacent models from the same provider.

← All AI Models