Image-to-Video
Animates a source image into a video clip up to 10 seconds long at resolutions up to 1080p. Accepts image URLs as direct input.
Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt. Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Wan 2.5.
Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It produces video clips up to 10 seconds long at resolutions up to 1080p, and generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the visuals in a single generation step. The model accepts text prompts, still images, audio tracks, or existing video clips as input, and supports cinematic controls such as camera movement types, lighting styles, and depth of field specified directly in the prompt.
Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need video output with accompanying audio without separate post-production workflows. It supports prompts and generated dialogue in at least 8 languages, and offers 480p, 720p, and 1080p as standard output resolutions with native 4K available in preview. Compared to its predecessor Wan 2.2, this version doubles the maximum video duration from 5 to 10 seconds, raises the standard resolution from 720p to 1080p, and introduces the audio generation system as an entirely new feature.
Animates a source image into a video clip up to 10 seconds long at resolutions up to 1080p. Accepts image URLs as direct input.
Generates video clips from natural language prompts, supporting cinematic controls like dolly shots, crane movements, and color grading specified inline.
Produces dialogue with lip-sync, environmental sound effects, and background music simultaneously with the video in a single generation step.
Accepts prompts and generates dialogue across at least 8 languages, enabling localized video content without separate translation workflows.
Accepts a numeric seed value to make generations reproducible, allowing consistent outputs when iterating on a prompt.
Supports 480p, 720p, and 1080p as standard output resolutions, with native 4K available in preview, configurable via numeric parameters.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Description of what to exclude from the video.
A specific value that is used to guide the 'randomness' of the generation.
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Wan 2.5 discussions are most active in r/StableDiffusion, r/HiggsfieldAI, r/comfyui. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.
The strongest match in this snapshot has 292 upvotes and 132 comments.
WAN 2.5 was uncensored on higgsfield, can anyone tell me if WAN 2.6 is still uncensored? Higgsfield no longer gives enough credits to run off even ONE test video.
The "official" WAN 2.5/2.6 page is heavily censored making it useless. Try to describe a G rated scene where people are sunbathing by a pool and it will probably be blocked.
With the release of Wan 2.5/2.6 still uncertain in terms of open-source availability, I’m wondering if there are any locally runnable video generation models that come close to its quality. Ideally looking for something that can be downloaded and run offline (or self-hosted), even if it requires beefy hardware. Any recommendations or comparisons would be appreciated.
[https://x.com/Ali\_TongyiLab/status/1970401571470029070](https://x.com/Ali_TongyiLab/status/1970401571470029070)
Just incase you didn't free up some space, be ready .. for 10 sec 1080p generations.
EDIT NEW LINK : [https://x.com/Alibaba\_Wan/status/1970419930811265129](https://x.com/Alibaba_Wan/status/1970419930811265129)
Sounds like they will eventually release it but maybe if enough people ask it will happen sooner than later.
>I'll say it first, so as not to be scolded,.. The 2.5 sent tomorrow is the advance version. For the time being, there is only the API version. For the time being, the open source version is to be determined. It is recommended that the community call for follow-up open source and rational comments, lest it be inappropriate to curse in the live broadcast room tomorrow. Everyone manages the expectations. It is recommended to ask for open source directly in the live broadcast room tomorrow! But rational comments, I think it will be opened in general, but there is a time difference, which mainly depends on the attitude of the community. After all, WAN mainly depends on the community, and the volume of voice is still very important.
>Sep 23, 2025 · 9:25 AM UTC
https://preview.redd.it/pv9opbtv0wqf1.png?width=526&format=png&auto=webp&s=a707e0b44d4833393be66f6d09194a275bb7d279
Wan 2.5 has a context window of 2,000 tokens, which applies to the text prompt input used to guide video generation.
Wan 2.5 accepts image URL arrays, text prompts, numeric parameters (such as resolution and duration settings), and a seed value for reproducibility.
Yes. Wan 2.5 generates synchronized audio — including dialogue with lip-sync, ambient sound effects, and background music — alongside the video in a single generation step, with no separate audio recording or post-production required.
Standard output resolutions are 480p, 720p, and 1080p. Native 4K output is available in preview.
According to the available metadata, Wan 2.5's training date is listed as September 2025.
Wan 2.5 is described as an open-source model developed by Alibaba's DAMO Academy. Community discussion on Reddit indicates that open weights availability was a topic of active interest around the time of its announcement.
Continue browsing adjacent models from the same provider.