Lightricks

LTX-2.3

LTX-2.3 is a multimodal video generation model developed by Lightricks and released in March 2026. Built on a Diffusion Transformer architecture with 22 billion parameters, it generates synchronized audio and video in a single forward pass at resolutions up to 4K at 50 frames per second, for clips up to 20 seconds long. It is available as open-source software with open weights under a permissive license, and can be run locally, accessed via API, or deployed on-premises. The model introduces several architectural updates over its predecessor, including a rebuilt variational autoencoder for sharper texture and edge detail, a gated attention text connector for improved prompt adherence, and an upgraded vocoder trained on filtered audio data for cleaner output. It supports native portrait-mode output at 1080×1920 and ships in four checkpoint variants — dev, distilled, fast, and pro — with the distilled variant completing generation in as few as 8 denoising steps. LTX-2.3 is aimed at independent creators, small studios, and developers who need a production-ready open-source foundation for video creation without licensing fees.

Unknown 1,000 context N/A output
Text to Video Image to Video Synchronized Audio Output Portrait Mode Support Fast Distilled Generation Configurable Generation Parameters

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Lightricks

Input Context Window

The number of tokens supported by the input context window.

1,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

Hugging Face

Modalities

Types of data this model can process.

Video Text Audio Code

What is LTX-2.3

A fuller summary of positioning, capabilities, and source-specific details for LTX-2.3.

LTX-2.3 is a multimodal video generation model developed by Lightricks and released in March 2026. Built on a Diffusion Transformer architecture with 22 billion parameters, it generates synchronized audio and video in a single forward pass at resolutions up to 4K at 50 frames per second, for clips up to 20 seconds long. It is available as open-source software with open weights under a permissive license, and can be run locally, accessed via API, or deployed on-premises.

The model introduces several architectural updates over its predecessor, including a rebuilt variational autoencoder for sharper texture and edge detail, a gated attention text connector for improved prompt adherence, and an upgraded vocoder trained on filtered audio data for cleaner output. It supports native portrait-mode output at 1080×1920 and ships in four checkpoint variants — dev, distilled, fast, and pro — with the distilled variant completing generation in as few as 8 denoising steps. LTX-2.3 is aimed at independent creators, small studios, and developers who need a production-ready open-source foundation for video creation without licensing fees.

Capabilities

What LTX-2.3 supports

VID

Text to Video

Generates video clips from text prompts at resolutions up to 4K at 50 FPS, for clips up to 20 seconds long.

IMG

Image to Video

Animates a provided image into a video clip, using an imageUrl input to anchor the first frame of generation.

AUD

Synchronized Audio Output

Produces audio and video together in a single forward pass, eliminating the need for separate audio post-processing.

AI

Portrait Mode Support

Generates video natively at 1080×1920 resolution without cropping from a landscape output.

AI

Fast Distilled Generation

The distilled checkpoint variant completes video generation in as few as 8 denoising steps for rapid iteration.

AI

Configurable Generation Parameters

Accepts numeric inputs, toggle groups, and seed values to control resolution, duration, and reproducibility of outputs.

AI

Multiple Checkpoint Variants

Ships in four variants — dev, distilled, fast, and pro — allowing users to trade generation speed against output quality.

Pricing for LTX-2.3

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Hugging Face

Configuration & Parameters

The configurable options currently documented for this model.

Resolution

Select
Default: 720p
1080p 720p 480p

Duration

Number
Default: 5 Range: 5 - 20

Aspect Ratio

Toggle Group
Default: 16:9

Seed

Seed

A specific value that is used to guide the 'randomness' of the generation.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Resolution Duration Aspect Ratio Seed

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about LTX-2.3

LTX-2.3 discussions are most active in r/StableDiffusion, r/comfyui, r/LocalLLaMA. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.

The strongest match in this snapshot has 764 upvotes and 104 comments.

Our web team ships fast. Apparently a little *too* fast. You found the page before we did. So let's do this properly:

Nearly five million downloads of LTX-2 since January. The feedback that came with them was consistent: frozen I2V, audio artifacts, prompt drift on complex inputs, soft fine details. [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) is the result.

https://reddit.com/link/1rlm21a/video/elgkhgpmv8ng1/player

**Better fine details: rebuilt latent space and updated VAE**

We rebuilt our VAE architecture, trained on higher quality data with an improved recipe. The result is a new latent space with sharper output and better preservation of textures and edges.

Previous checkpoints had great motion and structure, but some fine textures (hair, edge detail especially) were softer than we wanted, particularly at lower resolutions. The new architecture generates sharper details across all resolutions. If you've been upscaling or sharpening in post, you should need less of that now.

**Better prompt understanding: larger and more capable text connector**

We increased the capacity of the text connector and improved the architecture that bridges prompt encoding and the generation model. The result is more accurate interpretation of complex prompts, with less drift from the prompt. This should be most noticeable on prompts with multiple subjects, spatial relationships, or specific stylistic instructions.

**Improved image-to-video: less freezing, more motion**

This was one of the most reported issues. I2V outputs often froze or produced a slow pan instead of real motion. We reworked training to eliminate static videos, reduce unexpected cuts, and improve visual consistency from the input frame.

**Cleaner audio**

We filtered the training set for silence, noise, and artifacts, and shipped a new vocoder. Audio is more reliable now: fewer random sounds, fewer unexpected drops, tighter alignment.

**Portrait video: native vertical up to 1080x1920**

Native portrait video, up to 1080x1920. Trained on vertical data, not cropped from widescreen. First time in LTX.

Vertical video is the default format for TikTok, Reels, Shorts, and most mobile-first content. Portrait mode is now native in 2.3: set the resolution and generate.

Weights, distilled checkpoint, latent upscalers, and updated ComfyUI reference workflows are all live now. The training framework, benchmarks, LoRAs, and the complete multimodal pipeline carry forward from LTX-2. The API will be live in an hour.

[Discord](https://discord.gg/ltxplatform) is active. GitHub issues are open. We respond to both.

Open Reddit thread
r/StableDiffusion 541 upvotes 179 comments March 5, 2026
LTX-2.3: Introducing LTX's Latest AI Video Model

# What is the difference between LTX-2 and LTX-2.3?

LTX-2.3 brings four major improvements over LTX-2.

A redesigned VAE produces sharper fine details, more realistic textures, and cleaner edges.

A new gated attention text connector means prompts are followed more closely — descriptions of timing, motion, and expression translate more faithfully into the output.

Native portrait video support lets you generate vertical (1080×1920) content without cropping from landscape.

And audio quality is significantly cleaner, with silence gaps and noise artifacts filtered from the training set.

i can not find this latest version on huggingface, not uploaded?

Open Reddit thread
r/StableDiffusion 382 upvotes 236 comments March 5, 2026
We just shipped LTX Desktop: a free local video editor built on LTX-2.3

If your engine is strong enough, you should be able to build real products on top of it.

Introducing [LTX Desktop](https://ltx.io/ltx-desktop). A fully local, open-source video editor powered by LTX-2.3. It runs on your machine, renders offline, and doesn't charge per generation. Optimized for NVIDIA GPUs and compatible hardware.

We built it to prove the engine holds up. We're open-sourcing it because we think you'll take it further.

**What does it do?**

**Al Generation**

* Text-to-video and image-to-video generation
* Still image generation (via Z- mage Turbo)
* Audio-to-Video
* Retake - regenerate specific portions of an input video

**Al-Native Editing**

* Generate multiple takes per clip directly in the timeline and switch between them non-destructively. Each new version is nested within the clip, keeping your timeline modular.
* Context-aware gap fill - automatically generate content that matches surrounding clips
* Retake - regenerate specific sections of a clip without leaving the timeline

**Professional Editing Tools**

* Trim tools - slip, slide, roll, and ripple
* Built-in transitions
* Primary color correction tools

**Interoperability**

* Import/Export XML timelines for round-trip edits back to other NLEs
* Supports timelines from Premiere Pro, DaVinci Resolve, and Final Cut Pro

**Integrated Text & Subtitle Workflow**

* Text overlays directly in the timeline
* Built-in subtitle editor
* SRT import and export

**High-Quality Export**

• Export to H.264 and ProRes

LTX Desktop is available to run on Windows and macOS (via API).

[Download now](https://ltx.io/ltx-desktop). [Discord](https://discord.gg/ltxplatform) is active for feedback.

Open Reddit thread
r/StableDiffusion 764 upvotes 104 comments March 27, 2026
I got LTX-2.3 Running in Real-Time on a 4090

Yooo Buff here.

I've been working on running LTX-2.3 as efficiently as possible directly in Scope on consumer hardware.

For those who don't know, [Scope](https://github.com/daydreamlive/scope) is an open-source tool for running real-time AI pipelines. They recently launched a plugin system which allows developers to build custom plugins with new models. Scope has normally focuses on autoregressive/self-forcing/causal models, (LongLive, Krea Realtime, etc), but I think there is so much we can do with fast back-to-back bi-directional workflows (inter-dimensional TV anyone?)

I've been working with the folks at [Daydream.live](http://Daydream.live) to optimize LTX-2.3 to run in real-time, and I finally got it running on my local 4090! It's a bit of a balance in FP8 optimizations, resolution, frame count, etc. There is a slight delay between clips in the example video shared, you can manage this by changing these params to find a sweet spot in performance. Still a work in progress!

Currently Supports:

\- T2V
\- TI2V
\- V2V with [IC-LoRA](https://huggingface.co/Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control) Union (Control input, ex: DWPose, Depth)
\- Audio output
\- LoRAs (Comfy format)
\- Randomized seeds for each run
\- Real-time prompting (Does require the text-encoder to push the model out of VRAM to encode the input prompt conditioning, so there is a short delay between prompting, I'm looking into having sequential prompts run a bit quicker).

This software playground is completely free, I hope you all check it out. If you're interested in real-time AI visual and audio pipelines, join the [Daydream Discord](https://discord.gg/pF2Akym5bV)!

I want to thank all the amazing developers and engineers who allow us to build amazing things, including [Lightricks](https://huggingface.co/Lightricks), [AkaneTendo25](https://github.com/AkaneTendo25/musubi-tuner), [Ostris](https://github.com/ostris/ai-toolkit), [RyanOnTheInside](https://www.youtube.com/@ryanontheinside), [Comfy Org](https://github.com/Comfy-Org/ComfyUI) (ComfyAnon, Kijai and others), and the amazing open-source community for working tirelessly on pushing LTX-2.3 to new levels.

Get Scope [Here](https://github.com/daydreamlive/scope).
Get the Scope LTX-2.3 Plugin [Here](https://github.com/daydreamlive/scope-ltx-2).

Have a great weekend!

Open Reddit thread
View more discussions →
FAQ

Common questions about LTX-2.3

What is the context window for LTX-2.3?

LTX-2.3 has a context window of 1,000 tokens, which governs the length and detail of text prompts it can process.

What is the maximum video resolution and length LTX-2.3 can produce?

LTX-2.3 can generate video at up to 4K resolution at 50 frames per second, for clips up to 20 seconds in duration.

Is LTX-2.3 open source, and can I run it locally?

Yes. LTX-2.3 is released as open-source software with open weights under a permissive license. It can be run locally via LTX Desktop, accessed through the Lightricks API, or deployed on-premises using the published weights on Hugging Face.

What checkpoint variants are available?

LTX-2.3 ships in four checkpoint variants: dev, distilled, fast, and pro. The distilled variant is optimized for speed and can complete generation in as few as 8 denoising steps.

When was LTX-2.3 trained and released?

LTX-2.3 was released in March 2026, with a training data cutoff date also noted as March 2026 in the model metadata.

Does LTX-2.3 generate audio as well as video?

Yes. LTX-2.3 generates synchronized audio and video in a single forward pass. Audio quality was improved in this version through filtered training data and an upgraded vocoder.

More models from Lightricks

Continue browsing adjacent models from the same provider.

← All AI Models