Wan

Wan 2.5

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt. Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.

September 2025 2,000 context N/A output
Image-to-Video Text-to-Video Synchronized Audio Generation Multilingual Prompting Seed Control Resolution Selection

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Wan

Input Context Window

The number of tokens supported by the input context window.

2,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

September 2025

Knowledge Cut-off Date

When the model's knowledge was last updated.

September 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Hugging Face

Modalities

Types of data this model can process.

Image Text Video Audio

What is Wan 2.5

A fuller summary of positioning, capabilities, and source-specific details for Wan 2.5.

Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt.

Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.

Capabilities

What Wan 2.5 supports

IMG

Image-to-Video

Animates a source image into a video clip up to 10 seconds long at resolutions up to 1080p. Accepts image URLs as direct input.

VID

Text-to-Video

Generates video clips from natural language prompts, supporting cinematic controls like dolly shots, crane movements, and color grading specified inline.

AUD

Synchronized Audio Generation

Produces dialogue with lip-sync, environmental sound effects, and background music simultaneously with the video in a single generation step.

AI

Multilingual Prompting

Accepts prompts and generates dialogue across at least 8 languages, enabling localized video content without separate translation workflows.

AI

Seed Control

Accepts a numeric seed value to make generations reproducible, allowing consistent outputs when iterating on a prompt.

AI

Resolution Selection

Supports 480p, 720p, and 1080p as standard output resolutions, with native 4K available in preview, configurable via numeric parameters.

Pricing for Wan 2.5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Hugging Face

Configuration & Parameters

The configurable options currently documented for this model.

Width

Number
Default: 1024 Range: 768 - 1440

Height

Number
Default: 1024 Range: 768 - 1440

Negative Prompt

Text

Description of what to exclude from the video.

Seed

Seed

A specific value that is used to guide the 'randomness' of the generation.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Width Height Negative Prompt Seed

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Wan 2.5

Wan 2.5 discussions are most active in r/StableDiffusion, r/HiggsfieldAI, r/comfyui. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.

The strongest match in this snapshot has 292 upvotes and 132 comments.

r/HiggsfieldAI 8 upvotes 51 comments December 17, 2025
Is WAN 2.6 still uncensored on Higgsfield?

WAN 2.5 was uncensored on higgsfield, can anyone tell me if WAN 2.6 is still uncensored? Higgsfield no longer gives enough credits to run off even ONE test video.

The "official" WAN 2.5/2.6 page is heavily censored making it useless. Try to describe a G rated scene where people are sunbathing by a pool and it will probably be blocked.

Open Reddit thread
r/StableDiffusion 9 upvotes 18 comments December 16, 2025
Are There Any Open-Source Video Models Comparable to Wan 2.5/2.6?

With the release of Wan 2.5/2.6 still uncertain in terms of open-source availability, I’m wondering if there are any locally runnable video generation models that come close to its quality. Ideally looking for something that can be downloaded and run offline (or self-hosted), even if it requires beefy hardware. Any recommendations or comparisons would be appreciated.

Open Reddit thread
r/StableDiffusion 235 upvotes 219 comments September 23, 2025
Wan 2.5

[https://x.com/Ali\_TongyiLab/status/1970401571470029070](https://x.com/Ali_TongyiLab/status/1970401571470029070)

Just incase you didn't free up some space, be ready .. for 10 sec 1080p generations.

EDIT NEW LINK : [https://x.com/Alibaba\_Wan/status/1970419930811265129](https://x.com/Alibaba_Wan/status/1970419930811265129)

Open Reddit thread
r/StableDiffusion 292 upvotes 132 comments September 23, 2025
Ask nicely for Wan 2.5 to be open source

Sounds like they will eventually release it but maybe if enough people ask it will happen sooner than later.

>I'll say it first, so as not to be scolded,.. The 2.5 sent tomorrow is the advance version. For the time being, there is only the API version. For the time being, the open source version is to be determined. It is recommended that the community call for follow-up open source and rational comments, lest it be inappropriate to curse in the live broadcast room tomorrow. Everyone manages the expectations. It is recommended to ask for open source directly in the live broadcast room tomorrow! But rational comments, I think it will be opened in general, but there is a time difference, which mainly depends on the attitude of the community. After all, WAN mainly depends on the community, and the volume of voice is still very important.

>Sep 23, 2025 · 9:25 AM UTC

https://preview.redd.it/pv9opbtv0wqf1.png?width=526&format=png&auto=webp&s=a707e0b44d4833393be66f6d09194a275bb7d279

Open Reddit thread
View more discussions →
FAQ

Common questions about Wan 2.5

What is the context window for Wan 2.5?

Wan 2.5 has a context window of 2,000 tokens, which applies to the text prompt input used to guide video generation.

What input types does Wan 2.5 accept?

Wan 2.5 accepts image URL arrays, text prompts, numeric parameters (such as resolution and duration settings), and a seed value for reproducibility.

Does Wan 2.5 generate audio as well as video?

Yes. Wan 2.5 generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the video in a single generation step, with no separate audio recording or post-production required.

What resolutions does Wan 2.5 support?

Standard output resolutions are 480p, 720p, and 1080p. Native 4K output is available in preview.

What is the training data cutoff for Wan 2.5?

According to the available metadata, Wan 2.5's training date is listed as September 2025.

Is Wan 2.5 open source?

Wan 2.5 is described as an open-source model developed by Alibaba's DAMO Academy. Community discussion on Reddit indicates that open weights availability was a topic of active interest around the time of its announcement.

More models from Wan

Continue browsing adjacent models from the same provider.

← All AI Models