Lightricks

LTX-2 19b

LTX-2 19B is an open-source video generation model developed by Lightricks and released on January 6, 2026. It uses an asymmetric dual-stream Diffusion Transformer architecture to generate video and synchronized audio together in a single unified process, rather than producing silent video and adding audio as a separate step. The model accepts text prompts, reference images, or existing video clips as input and outputs native 4K video with flexible frame-rate control and support for extended clip durations. What distinguishes LTX-2 19B is its simultaneous audiovisual output, where ambient sound, environmental effects, and speech synchronization are generated alongside the video frames. The model supports LoRA fine-tuning for camera motion control and custom stylization, and offers NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x. A distilled 8-step fast generation mode runs 5–6 times faster than the full model, and on an RTX 4090 with NVFP4 quantization an 8-second 720p clip can be produced in approximately 25 seconds. It is well suited for film-style storytelling, advertising production, and any workflow requiring tight audiovisual coherence.

Unknown 1,000 context N/A output

Unified AV Generation Native 4K Output Image-to-Video LoRA Camera Control Quantized Inference Fast Distilled Mode

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Parameters ↓ Tools ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Lightricks

Input Context Window

The number of tokens supported by the input context window.

1,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

Hugging Face

Modalities

Types of data this model can process.

Video Text Image Audio

What is LTX-2 19b

A fuller summary of positioning, capabilities, and source-specific details for LTX-2 19b.

LTX-2 19B is an open-source video generation model developed by Lightricks and released on January 6, 2026. It uses an asymmetric dual-stream Diffusion Transformer architecture to generate video and synchronized audio together in a single unified process, rather than producing silent video and adding audio as a separate step. The model accepts text prompts, reference images, or existing video clips as input and outputs native 4K video with flexible frame-rate control and support for extended clip durations.

What distinguishes LTX-2 19B is its simultaneous audiovisual output, where ambient sound, environmental effects, and speech synchronization are generated alongside the video frames. The model supports LoRA fine-tuning for camera motion control and custom stylization, and offers NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x. A distilled 8-step fast generation mode runs 5–6 times faster than the full model, and on an RTX 4090 with NVFP4 quantization an 8-second 720p clip can be produced in approximately 25 seconds. It is well suited for film-style storytelling, advertising production, and any workflow requiring tight audiovisual coherence.

Capabilities

What LTX-2 19b supports

Unified AV Generation

Generates video and scene-aware audio simultaneously in one pass using a dual-stream Diffusion Transformer, eliminating the sync issues common in separate audio-video pipelines.

Native 4K Output

Produces video at native 4K resolution with flexible frame-rate control and support for extended clip durations beyond standard short-form outputs.

IMG

Image-to-Video

Accepts a reference image URL as input and animates it into a video clip, preserving visual content from the source image across generated frames.

LoRA Camera Control

Supports Low-Rank Adaptation (LoRA) modules for precise camera motion control, enabling film-style cinematography directions such as pans, zooms, and tracking shots.

Quantized Inference

Supports NVFP4 and FP8 quantization formats that reduce VRAM usage by up to 60% and accelerate generation up to 3x compared to full-precision inference.

Fast Distilled Mode

Offers an 8-step distilled generation mode that runs 5–6x faster than the full model, producing an 8-second 720p clip in approximately 25 seconds on an RTX 4090 with NVFP4.

VID

Text-to-Video

Generates video directly from text prompts, translating scene descriptions into temporally stable video clips with synchronized audio.

Seed Control

Accepts a manual seed value as input, allowing reproducible generation runs and controlled variation across outputs.

Pricing for LTX-2 19b

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens N/A Per million tokens

Output tokens N/A Per million tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Hugging Face

Configuration & Parameters

The configurable options currently documented for this model.

Resolution

Select

Default: 720p

1080p 720p 480p

Duration

Number

Default: 5 Range: 5 - 20

LoRAs

LoRA

Up to 3 LoRAs.

Aspect Ratio

Toggle Group

Default: 16:9

Seed

A specific value that is used to guide the 'randomness' of the generation.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Resolution Duration LoRAs Aspect Ratio Seed

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Announcement Blog Post Announcements

→

Model Files on Hugging Face Open Source

→

Setup & Usage Guide Documentation

→

API Reference Documentation

→

Community Integration Discussion Open Source

→

Official Lightricks GitHub Repository Open Source

→

LTX-Video Model on Hugging Face (Lightricks) Open Source

→

Community discussion

What people think about LTX-2 19b

LTX-2 19b discussions are most active in r/StableDiffusion, r/comfyui, r/LocalLLaMA. Top Reddit threads cluster around benchmark and model-comparison threads.

The strongest match in this snapshot has 984 upvotes and 216 comments.

r/StableDiffusion 301 upvotes 127 comments January 13, 2026

LTX-2 19b T2V/I2V GGUF 12GB Workflows!! Link in description

[https://civitai.com/models/2304098](https://civitai.com/models/2304098)

The examples shown in the preview video are a mix of 1280x720 and 848x480, with a few 640x640 thrown in. I really just wanted to showcase what the model can do and the fact it can run well. Feel free to mess with some of the settings to get what you want. Most of the nodes that you need to mess with if you want to tweak are still open. The ones that are all closed and grouped up can be ignored unless you want to modify more. For most people just set it and forget it!

These are two workflows that I've been using for my setup.

I have 12GB VRAM and 48GB system ram and I can run these easily.

The T2V is set for the 1280x720 and usually I get a 5s video in a little under 5 minutes. You can absolutely lessen that. I was making videos in 848x480 in about 2 minutes. So, it can FLY!

This does not use any fancy nodes (one node from Kijai KJNodes pack to load audio VAE and of course the GGUF node to load the GGUF model), no special optimization. It's just a standard workflow so you don't need anything like Sage, Flash Attention, that one thing that goes "PING!"... not needed.

I2V is set for a resolution of 640x640 but I have left a note in the spot where you can define your own resolution. I would stick in the 480-640 range (adjust for widescreen etc) the higher the res the better. You CAN absolutely do 1280x720 videos in I2V as well but they will take FOREVER. Talking like 3-5 minutes on the upscale PER ITERATION!! But, the results are much much better!

Links to the models used are right next to the models section, notes on what you need also there.

This is the native comfy workflow that has been altered to include the GGUF, separated VAE, clip connector, and a few other things. Should be just plug and play. Load in the workflow, download and set your models, test.

I have left a nice little prompt to use for T2V, I2V I'll include the prompt and provide the image used.

Drop a note if this helps anyone out there. I just want everyone to enjoy this new model because it is a lot of fun. It's not perfect but it is a meme factory for sure.

If I missed anything, you have any questions, comments, anything at all just drop a line and I'll do my best to respond and hopefully if you have a question I have an answer!

Open Reddit thread

r/StableDiffusion 109 upvotes 51 comments January 12, 2026

ltx-2-19b-distilled vs ltx-2-19b-dev + distilled-lora

I’m comparing LTX-2 outputs with the same setup and found something interesting.

Setup:

* LTX-2 IC-LoRA (Pose) I2V
* Sampler: Euler Simple
* Steps: 8
* (+ refine 3 steps)

Models tested:

1. `ltx-2-19b-distilled-fp8`
2. `ltx-2-19b-dev-fp8.safetensors` \+ `ltx-2-19b-distilled-lora-384` (strength **1.0**)
3. `ltx-2-19b-dev-fp8.safetensors` \+ `ltx-2-19b-distilled-lora-384` (strength **0.6**)

workflow + other results:

* [https://scrapbox.io/work4ai/ltx-2-19b-distilled\_vs\_ltx-2-19b-distilled-lora](https://scrapbox.io/work4ai/ltx-2-19b-distilled_vs_ltx-2-19b-distilled-lora)

As you can see, `ltx-2-19b-distilled` and the dev model with `ltx-2-19b-distilled-lora` at **strength 1.0** end up producing almost the same result in my tests. That consistency is nice, but both also tend to share the same downside: the output often looks “overcooked” in an AI-ish way (plastic skin, burn-out / blown highlights, etc.).

With the recommended **LoRA strength 0.6**, the result looks a lot more natural and the harsh artifacts are noticeably reduced.

I started looking into this because the distilled LoRA is huge (\~7.67GB), so I wanted to replace it with the distilled checkpoint to save space. But for my setup, the distilled checkpoint basically behaves like “LoRA = 1.0”, and I can’t get the nicer look I’m getting at 0.6 even after trying a few sampling tweaks.

If you’re seeing similar plastic/burn-out artifacts with `ltx-2-19b-distilled(-fp8)`, I’d suggest using the LoRA instead — at least with the LoRA you can adjust the strength.

Open Reddit thread

r/StableDiffusion 984 upvotes 216 comments January 15, 2026

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2.

New version of Workflow (v2):

[https://github.com/RageCat73/RCWorkflows/blob/main/011426-LTX2-AudioSync-i2v-Ver2.json](https://github.com/RageCat73/RCWorkflows/blob/main/011426-LTX2-AudioSync-i2v-Ver2.json)

This is a follow-up to my previous post - please read it for more information and context:

[https://www.reddit.com/r/StableDiffusion/comments/1qcc81m/ltx2\_audio\_synced\_to\_added\_mp3\_i2v\_6\_examples\_3/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/StableDiffusion/comments/1qcc81m/ltx2_audio_synced_to_added_mp3_i2v_6_examples_3/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)

Thanks to user u/foxdit for pointing out that the strength of the LTX Distill Lora 384 can greatly affect the quality of realistic people. This new workflow sets it to .6

Credit MUST go to Kijai for introducing the first workflows that have the Mel-Band model that makes this possible. I hear he doesn't have much time to devote to refining workflows so it's up to the community to take what he gives us and build on them.

There is also an optional detail lora in the upscale group/node. It's disabled in my new workflow by default to save memory, but setting it to .3 is another recommendation. You can see the results for yourself in the video.

Bear in mind the video is going to get compressed by Reddit's servers but you'll still be able to see a significant difference. If you want to see the original 110 mb video, let me know and I'll send a Google drive link to it. I'd rather not open up my Google drive to everyone publicly.

The new workflow is also friendlier to beginners, it has better notes and literally has areas and nodes labelled Steps 1-7. It moves the Load Audio node closer to the Load image and trim audio nodes as well. Overall, it's minor improvements. If you already have the other one, it may not be worth it unless you're curious.

The new workflow has ALL the download links to the models and LORAs, but I'll also paste them below. I'll try to answer questions if I can, but there may be a delay of a day or 2 depending on your timezone and my free time.

Based on this new testing, I really can't recommend the distilled only model (8step model) because the distilled workflows don't have any way to alter the strength of the LORA that is baked inherently into the model. Some people may be limited to that model due to hardware constraints.

**IMPORTANT NOTE ABOUT PROMPT (updated 1/16/26): FOR BEST RESULTS, add the lyrics of the song or a transcript of the words being spoken in the prompt. In further experiments, this helps a lot.**

**The woman sings the words: "My Tea's gone cold I'm wondering why got out of bed at all..." will help to trigger the lip sync. Sometimes you only need the first few words of the lyric, but it may be best to include as many of the words as possible for a good lip sync. Also add emotions and expressions to the prompt as well or go with: the woman sings with passion and emotion if you want to be generic.**

**IMPORTANT NOTE ABOUT RESOLUTION: My workflow is set to 480x832 (portrait) as a STARTING resolution. Change that to what you think your system can handle. You MUST change that to 832x480 if you do a widescreen image or higher otherwise, you're going to get a VERY small video. Look at the Preview node for what the final resolution of the image will be. Remember, it must be divisible by 32, but the resize node in Step 2 handles that. Please read the notes in the workflow if you're a beginner.**

\*\*\*\*\* If you notice the lipsync is kinda wonky in this video, it's because I slapped the video together in a rush. I only noticed after I rendered it in Resolve and by then I was rushed to do something else so I didn't bother to go back and fix it. Since I only cared about showing the quality and I've already posted, I'm not going to go back and fix it even though it bothers my OCD a little.

Some other stats. I'm very fortunate to have a 4090 (24 gb VRAM) and 64 gb of system RAM (purchased over a year ago) before the price craziness. a 768 x 1088 video 20 seconds (481 frames - 24fps) takes 6-10 minutes depending on the Loras I set, 25 steps using Euler. Your mileage will vary.

\*\*\*update to post: I'm using a VERY simple prompt. My goal wasn't to test prompt adherence but to mess with quality and lipsync. Here is the embarrassingly short prompt that I sometimes vary with 1-2 words about expressions or eye contact. This is driving nearly ALL of my singing videos:

**"A video of a woman singing. She sings with subtle and fluid movements and a happy expression. She sings with emotion and passion. static camera."**

Crazy, right?

Models and Lora List

\*checkpoints\*\*

\- \[ltx-2-19b-dev-fp8.safetensors\]

[https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-dev-fp8.safetensors](https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-dev-fp8.safetensors)

\*\*text\_encoders - Quantized Gemma

\- \[gemma\_3\_12B\_it\_fp8\_e4m3fn.safetensors\]

[https://huggingface.co/GitMylo/LTX-2-comfy\_gemma\_fp8\_e4m3fn/resolve/main/gemma\_3\_12B\_it\_fp8\_e4m3fn.safetensors?download=true](https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/resolve/main/gemma_3_12B_it_fp8_e4m3fn.safetensors?download=true)

\*\*loras\*\*

\- \[LTX-2-19b-LoRA-Camera-Control-Static\]

[https://huggingface.co/Lightricks/LTX-2-19b-LoRA-Camera-Control-Static/resolve/main/ltx-2-19b-lora-camera-control-static.safetensors?download=true](https://huggingface.co/Lightricks/LTX-2-19b-LoRA-Camera-Control-Static/resolve/main/ltx-2-19b-lora-camera-control-static.safetensors?download=true)

\- \[ltx-2-19b-distilled-lora-384.safetensors\]

[https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-distilled-lora-384.safetensors?download=true](https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-distilled-lora-384.safetensors?download=true)

\*\*latent\_upscale\_models\*\*

\- \[ltx-2-spatial-upscaler-x2-1.0.safetensors\]

[https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-spatial-upscaler-x2-1.0.safetensors](https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-spatial-upscaler-x2-1.0.safetensors)

Mel-Band RoFormer Model - For Audio

\- \[MelBandRoformer\_fp32.safetensors\]

[https://huggingface.co/Kijai/MelBandRoFormer\_comfy/resolve/main/MelBandRoformer\_fp32.safetensors?download=true](https://huggingface.co/Kijai/MelBandRoFormer_comfy/resolve/main/MelBandRoformer_fp32.safetensors?download=true)

Open Reddit thread

r/comfyui 29 upvotes 15 comments January 9, 2026

8s/720p; LTX-2 19b Distilled-fp8; 5090; 67 seconds generation time.

Open Reddit thread

r/StableDiffusion 12 upvotes 2 comments February 9, 2026

Running LTX-2 19B on a Jetson Thor - open-source pipeline with full memory lifecycle management

I've been running LTX-2 (the 19B distilled model) on an NVIDIA Jetson AGX Thor and built an open-source pipeline around it. Generating 1080p video (1920x1088) at 24fps with audio, camera control LoRAs, and batch rendering. Figured I'd share since there's almost nothing out there about running big video models on Jetson.

\*\*GitHub: [github.com/divhanthelion/ltx2](http://github.com/divhanthelion/ltx2)

\## What it generates

https://reddit.com/link/1r03u80/video/ep0gbzpsxgig1/player

1920x1088, 161 frames (\~6.7s), 24fps with synchronized audio. About 15 min diffusion + 2 min VAE decode per clip on the Thor.

\## The interesting part: unified memory

The Jetson Thor has 128GB of RAM shared between CPU and GPU. This sounds great until you realize it breaks every standard memory optimization:

\- \*\*\`enable\_model\_cpu\_offload()\` is useless\*\* — CPU and GPU are the same memory. Moving tensors to CPU frees nothing. Worse, the offload hooks create reference paths that prevent model deletion, and removing them later leaves models in an inconsistent state that segfaults during VAE decode.

\- \*\*\`tensor.to("cpu")\` is a no-op\*\* — same physical RAM. You have to actually \`del\` the object and run \`gc.collect()\` + \`torch.cuda.empty\_cache()\` (twice — second pass catches objects freed by the first).

\- \*\*Page cache will kill you\*\* — safetensors loads weights via mmap. Even after \`.to("cuda")\`, the original pages may still be backed by page cache. If you call \`drop\_caches\` while models are alive, the kernel evicts the weight pages and your next forward pass segfaults.

\- \*\*You MUST use \`torch.no\_grad()\` for VAE decode\*\* — without it, PyTorch builds autograd graphs across all 15+ spatial tiles during tiled decode. On unified memory, this doesn't OOM cleanly — it segfaults. I lost about 4 hours to this one.

The pipeline does manual memory lifecycle: load everything → diffuse → delete transformer/text encoder/scheduler/connectors → decode audio → delete audio components → VAE decode under \`no\_grad()\` → delete everything → flush page cache → encode video. Every stage has explicit cleanup and memory reporting.

\## What's in the repo

\- \`generate.py\` — the main pipeline with all the memory management

\- \`decode\_latents.py\` — standalone decoder for recovering from failed runs (latents are auto-saved)

\- Batch rendering scripts with progress tracking and ETA

\- Camera control LoRA support (dolly in/out/left/right, jib up/down, static)

\- Optional FP8 quantization (cuts transformer memory roughly in half)

\- Post-processing pipeline for RIFE frame interpolation + Real-ESRGAN upscaling (also Dockerized)

Everything runs in Docker so you don't touch your system Python. The NGC PyTorch base image has the right CUDA 13 / sm\_110 build.

\## Limitations (being honest)

\- \*\*Distilled model only does 8 inference steps\*\* — motion is decent but not buttery smooth. Frame interpolation in post helps.

\- \*\*Negative prompts don't work\*\* — the distilled model uses CFG=1.0, which mathematically eliminates the negative prompt term. It accepts the flag silently but does nothing.

\- \*\*1080p is the ceiling for quality\*\* — you can generate higher res but the model was trained at 1080p. Above that you get spatial tiling seams and coherence loss. Better to generate at 1080p and upscale.

\- \*\*\~15 min per clip\*\* — this is a 19B model on an edge device. It's not fast. But it's fully local and offline.

\## Hardware

NVIDIA Jetson AGX Thor, JetPack 7.0, CUDA 13.0. 128GB unified memory. The pipeline needs at least 128GB — at 64GB you'd need FP8 + pre-computed text embeddings to fit, and it would be very tight.

If anyone else is running video gen models on Jetson hardware, I'd love to compare notes. The unified memory gotchas are real and basically undocumented.

Open Reddit thread

View more discussions →

FAQ

Common questions about LTX-2 19b

What is the context window for LTX-2 19B?

LTX-2 19B has a context window of 1,000 tokens, as specified in the model metadata.

Is LTX-2 19B open source and can it be run locally?

Yes, LTX-2 19B is fully open source. It can be deployed locally without any cloud dependency, and model files are available on Hugging Face. It is also compatible with ComfyUI via community integrations.

What hardware is required to run LTX-2 19B locally?

The model supports NVFP4 and FP8 quantization, which reduce VRAM requirements by up to 60%. With NVFP4 quantization on an RTX 4090, an 8-second 720p clip can be generated in approximately 25 seconds. Exact minimum VRAM requirements depend on the quantization format and output resolution chosen.

Does LTX-2 19B generate audio as well as video?

Yes. LTX-2 19B generates video and synchronized audio together in a single unified process. The audio output includes ambient sound, environmental effects, and speech synchronization that correspond to the on-screen action.

What input types does LTX-2 19B accept?

The model accepts text prompts, reference image URLs, and existing video clips as inputs. It also supports LoRA configuration, numeric parameters, toggle group settings, and a manual seed value for reproducibility.

When was LTX-2 19B released and who developed it?

LTX-2 19B was developed by Lightricks and released on January 6, 2026. It was added to MindStudio on January 13, 2026.

More models from Lightricks

Continue browsing adjacent models from the same provider.

← All AI Models

LTX-2 19b

Model Overview

Provider

Input Context Window

Maximum Output Tokens

Open Source

Release Date

Knowledge Cut-off Date

API Providers

Modalities

What is LTX-2 19b

What LTX-2 19b supports

Unified AV Generation

Native 4K Output

Image-to-Video

LoRA Camera Control

Quantized Inference

Fast Distilled Mode

Text-to-Video

Seed Control

Pricing for LTX-2 19b

API Access & Providers

Configuration & Parameters

Resolution

Duration

LoRAs

Aspect Ratio

Seed

Supported Request Parameters

Resources & Documentation

AI tools related to LTX-2 19b

Facetune

What people think about LTX-2 19b

Common questions about LTX-2 19b

More models from Lightricks