Text-to-Video
Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.
Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement. Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Wan 2.5.
Wan 2.5 is an open-source AI video generation model developed by Alibaba's DAMO Academy. It generates videos up to 10 seconds long at resolutions ranging from 480p to 1080p HD, with native 4K available in preview, all rendered at 24 frames per second. The model's defining characteristic is its ability to generate audio and video simultaneously in a single step — producing character dialogue with lip-sync, environmental ambient sounds, and background music directly from a text or image prompt, without requiring separate post-production audio work. It supports multiple input modes including text-to-video, image-to-video, audio-to-video, and video-to-video refinement.
Wan 2.5 is designed for content creators, filmmakers, advertisers, and developers who need production-ready video with synchronized audio. It supports cinematic camera controls such as dolly, tracking, and crane movements, as well as lighting styles, depth of field, and particle effects like rain and fire. The model handles photorealistic, anime, illustrated, and stylized visual aesthetics, and processes prompts in at least 8 languages with matching audio generation. Its open-source nature makes it accessible for local deployment and integration into custom pipelines.
Generates video clips up to 10 seconds long from a text prompt at resolutions of 480p, 720p, or 1080p HD at 24fps.
Animates a source image into a video clip, using the provided image URL as the visual starting point for generation.
Produces dialogue with lip-sync, ambient environmental sounds, and background music in a single generation step alongside the video.
Supports named camera movements including dolly, tracking, and crane shots, as well as depth of field and color grading settings specified in the prompt.
Accepts prompts in at least 8 languages and generates matching audio output in the corresponding language.
Accepts a seed value as an input parameter, allowing reproducible generation results for a given prompt and settings combination.
Handles photorealistic, anime, illustrated, and other stylized visual aesthetics based on prompt instructions.
Accepts an existing video as input and applies prompt-guided modifications or style changes to produce a refined output.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Description of what to exclude from the video.
A specific value that is used to guide the 'randomness' of the generation.
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Wan 2.5 discussions are most active in r/StableDiffusion, r/HiggsfieldAI, r/comfyui. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.
The strongest match in this snapshot has 289 upvotes and 132 comments.
WAN 2.5 was uncensored on higgsfield, can anyone tell me if WAN 2.6 is still uncensored? Higgsfield no longer gives enough credits to run off even ONE test video.
The "official" WAN 2.5/2.6 page is heavily censored making it useless. Try to describe a G rated scene where people are sunbathing by a pool and it will probably be blocked.
With the release of Wan 2.5/2.6 still uncertain in terms of open-source availability, I’m wondering if there are any locally runnable video generation models that come close to its quality. Ideally looking for something that can be downloaded and run offline (or self-hosted), even if it requires beefy hardware. Any recommendations or comparisons would be appreciated.
[https://x.com/Ali\_TongyiLab/status/1970401571470029070](https://x.com/Ali_TongyiLab/status/1970401571470029070)
Just incase you didn't free up some space, be ready .. for 10 sec 1080p generations.
EDIT NEW LINK : [https://x.com/Alibaba\_Wan/status/1970419930811265129](https://x.com/Alibaba_Wan/status/1970419930811265129)
Sounds like they will eventually release it but maybe if enough people ask it will happen sooner than later.
>I'll say it first, so as not to be scolded,.. The 2.5 sent tomorrow is the advance version. For the time being, there is only the API version. For the time being, the open source version is to be determined. It is recommended that the community call for follow-up open source and rational comments, lest it be inappropriate to curse in the live broadcast room tomorrow. Everyone manages the expectations. It is recommended to ask for open source directly in the live broadcast room tomorrow! But rational comments, I think it will be opened in general, but there is a time difference, which mainly depends on the attitude of the community. After all, WAN mainly depends on the community, and the volume of voice is still very important.
>Sep 23, 2025 · 9:25 AM UTC
https://preview.redd.it/pv9opbtv0wqf1.png?width=526&format=png&auto=webp&s=a707e0b44d4833393be66f6d09194a275bb7d279
Wan 2.5 has a context window of 2,000 tokens, which governs the length and detail of the text prompt it can process for a single generation request.
Wan 2.5 generates videos at 480p, 720p, or 1080p HD resolutions, with native 4K available in preview. Videos can be up to 10 seconds long at 24 frames per second.
Audio generation is native and simultaneous — dialogue with lip-sync, ambient sounds, and background music are all produced in a single generation step alongside the video, with no separate post-production required.
Wan 2.5 accepts text prompts, image URLs (for image-to-video), audio inputs, select parameters for configuration options, and a seed value for reproducible outputs.
Yes, Wan 2.5 is open source and was developed by Alibaba's DAMO Academy. Its training data has a cutoff of September 2025.
Wan 2.5 processes prompts in at least 8 languages and generates audio output that matches the language used in the prompt.
Continue browsing adjacent models from the same provider.