Stability

SDXL LoRA

SDXL LoRA is a text-to-image generative AI model developed by Stability AI, built as a successor to Stable Diffusion. It runs on a 3.5 billion parameter architecture and generates images natively at 1024×1024 resolution, using dual text encoders — OpenCLIP-ViT/G and CLIP-ViT/L — to interpret complex prompts with reported 89% prompt adherence in benchmark testing. The model also supports an optional refiner stage that applies an ensemble-of-experts approach to add fine detail to generated outputs. What distinguishes SDXL LoRA from the base SDXL model is its built-in support for Low-Rank Adaptation (LoRA), a technique that enables efficient style and subject customization without full model retraining. Users can apply up to five LoRA adapters simultaneously, making it practical for tasks like consistent character design, brand-specific imagery, and specialized artistic styles. It is well-suited for digital artists, marketing teams, game developers, and product designers who need repeatable, customizable visual output at scale.

Jul 04, 2023 10,000 context N/A output
Text-to-Image Generation LoRA Style Customization Image-to-Image Transformation Inpainting Seed Control Optional Refiner Stage

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Stability

Input Context Window

The number of tokens supported by the input context window.

10,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Jul 04, 2023 2 years ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

Hugging Face

Modalities

Types of data this model can process.

Image Text Code

What is SDXL LoRA

A fuller summary of positioning, capabilities, and source-specific details for SDXL LoRA.

SDXL LoRA is a text-to-image generative AI model developed by Stability AI, built as a successor to Stable Diffusion. It runs on a 3.5 billion parameter architecture and generates images natively at 1024×1024 resolution, using dual text encoders — OpenCLIP-ViT/G and CLIP-ViT/L — to interpret complex prompts with reported 89% prompt adherence in benchmark testing. The model also supports an optional refiner stage that applies an ensemble-of-experts approach to add fine detail to generated outputs.

What distinguishes SDXL LoRA from the base SDXL model is its built-in support for Low-Rank Adaptation (LoRA), a technique that enables efficient style and subject customization without full model retraining. Users can apply up to five LoRA adapters simultaneously, making it practical for tasks like consistent character design, brand-specific imagery, and specialized artistic styles. It is well-suited for digital artists, marketing teams, game developers, and product designers who need repeatable, customizable visual output at scale.

Capabilities

What SDXL LoRA supports

IMG

Text-to-Image Generation

Generates images from text prompts at a native 1024×1024 resolution using a 3.5 billion parameter architecture with dual text encoders for prompt interpretation.

AI

LoRA Style Customization

Applies Low-Rank Adaptation weights to customize the model's output style or subject without full retraining; supports stacking up to 5 LoRAs simultaneously.

IMG

Image-to-Image Transformation

Transforms an existing image guided by a text prompt, with adjustable prompt strength to control how much the output deviates from the source image.

AI

Inpainting

Fills or replaces specific masked regions of an image using text-guided generation, allowing targeted edits without regenerating the full image.

AI

Seed Control

Accepts a seed value as input to make image generation reproducible, enabling consistent outputs across repeated runs with the same prompt and settings.

AI

Optional Refiner Stage

Passes generated images through a secondary refiner model using an ensemble-of-experts approach to enhance fine detail and image sharpness.

Pricing for SDXL LoRA

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Hugging Face

Configuration & Parameters

The configurable options currently documented for this model.

Width

Number
Default: 1024 Range: 256 - 1536

Height

Number
Default: 1024 Range: 256 - 1536

LoRAs

LoRA

Up to 3 LoRAs.

Guidance Scale

Number
Default: 3.5 Range: 1 - 20 (step 0.1)

Inference Steps

Number
Default: 28 Range: 1 - 50 (step 1)

Negative Prompt

Text

Description of what to exclude from the video.

Seed

Seed

A specific value that is used to guide the 'randomness' of the generation.

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Width Height LoRAs Guidance Scale Inference Steps Negative Prompt Seed

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about SDXL LoRA

SDXL LoRA discussions are most active in r/StableDiffusion, r/comfyui, r/Lora. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.

The strongest match in this snapshot has 1989 upvotes and 114 comments.

r/StableDiffusion 5 comments October 31, 2025
SDXL LoRA trained on RTX 5080 - 40 images → ~95 % style match

Ran a local SDXL 1.0 LoRA on 40 reference images (same art style).

• Training time ≈ 2 h
• bf16 + PEFT = half VRAM use of DreamBooth
• Outputs retain 90-95 % style consistency

ComfyUI + LoRA pipeline feels way more stable than cloud runs, and no data ever leaves the machine.

Happy to share configs or talk optimization for small-dataset LoRAs. DM if you want to see samples or logs.

*(No promo—just showing workflow.)*

Open Reddit thread

[A workflow to train SDXL LoRAs.](https://civitai.com/models/1538062)

*This workflow is based on the incredible work by Kijai (*[*https://github.com/kijai/ComfyUI-FluxTrainer*](https://github.com/kijai/ComfyUI-FluxTrainer)*) who created the training nodes for ComfyUI based on Kohya\_ss (*[*https://github.com/kohya-ss/sd-scripts*](https://github.com/kohya-ss/sd-scripts)*) work. All credits go to them. Thanks also to* u/tom83_be *on Reddit who posted his installation and basic settings tips.*

Detailed instructions on the Civitai page.

Open Reddit thread

I trained a LoRA on a real person (my model) with 94 photos. Dataset breakdown: \~21 close-up portraits, rest is half-body and full-body shots with varied outfits, poses and environments.

**Training settings:**

* Base model: stabilityai/stable-diffusion-xl-base-1.0
* Optimizer: Prodigy, LR: 1
* Network Rank: 64, Alpha: 32
* Epochs: 10, Repeats: 2 per image = \~1880 total steps
* Scheduler: cosine\_with\_restarts, 5 cycles
* Flags: gradient\_checkpointing, cache\_latents, shuffle\_caption, no\_half\_vae

**Captioning strategy:** Removed all constant facial features from captions (hair color, eye color, tattoos, scar) — kept only pose, outfit, background, lighting.

**Problem:** Generated face doesn't look like her at all. Wrong jaw shape, wrong mouth. She has distinct features: black hair with purple highlights, moon phases neck tattoo, snake+rose shoulder tattoo, small scar on chin. Tattoos appear blurry/barely visible. Face geometry is completely wrong.

**What I tried:**

* 6 epochs with 15 repeats (\~8460 steps) — face too generic
* 10 epochs with 2 repeats (\~1880 steps) — face still doesn't match, tattoos not rendering

**Question:** What am I doing wrong? Is it the captioning strategy, training parameters, or something else entirely?

Open Reddit thread
View more discussions →
FAQ

Common questions about SDXL LoRA

What is the context window for SDXL LoRA?

The model has a context window of 10,000 tokens as listed in the metadata, though for image generation models this typically refers to the maximum prompt length or token budget for text input rather than a conversational context.

How many LoRAs can I apply at once?

You can stack up to 5 LoRA adapters simultaneously, allowing you to combine multiple styles or subject customizations in a single generation.

What output resolution does SDXL LoRA produce?

The model generates images natively at 1024×1024 resolution, which is larger than the 512×512 native output of earlier Stable Diffusion versions like SD 1.5.

Does SDXL LoRA support image editing, or only generation from scratch?

In addition to text-to-image generation, the model supports image-to-image transformation and inpainting, allowing you to modify existing images or fill specific masked regions using text prompts.

Is there a training cutoff date for this model?

No training date is specified in the available metadata for SDXL LoRA. For the most accurate information on training data cutoff, refer to Stability AI's official documentation.

More models from Stability

Continue browsing adjacent models from the same provider.

← All AI Models