Images
Image URL ArraySource images to edit. Reference them as "image1", "image2", etc. in the prompt.
FLUX.2 [klein] 9B with LoRA support is a high-quality text-to-image model with 9B parameters, offering enhanced realism, crisper text generation, and fast LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Source images to edit. Reference them as "image1", "image2", etc. in the prompt.
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
FLUX.2 [klein] 9B discussions are most active in r/StableDiffusion, r/comfyui, r/LocalLLaMA. Top Reddit threads cluster around benchmark and model-comparison threads.
The strongest match in this snapshot has 462 upvotes and 125 comments.
A quick experimental comparison between the three versions of Flux 2 Klein model:
* Flux 2 Klein 4B (sft; fp8; 3.9GB=disk size)
* Flux 2 Klein 9B (sft; fp8; 9GB)
* Flux 2 Klein 9Bkv (sft; fp8; 9.8GB)
**Speed wise:**
* Klein 4B is the fastest;
* Klein 9Bkv is significantly faster than Klein 9B.
* Since the disk size of these two models is very close, the gained speed up is a positive point for 9Bkv.
However, note that all of them run in a few seconds (4-6 steps), anyway.
Test 1: **Short bare-bone prompting**
[very short bare bone prompt.](https://preview.redd.it/re1jacmm58pg1.jpg?width=2048&format=pjpg&auto=webp&s=545fbe5cf3285a37251a712c0b2367e2e39ed7b7)
Some composition issues here; nonetheless, Klein 9B is the winner here for a better background (note the odd flower in 9Bkv). Also note 9Bkv's text rendering glitch. 4B shows a lot of unwanted changes (cloth...).
Test 2: **Slightly Longer Prompting**
[slightly longer prompting](https://preview.redd.it/wn47fsnt68pg1.jpg?width=2048&format=pjpg&auto=webp&s=a9794cd399987aee0162d8fcaf8fea8d77721128)
All models are prompted to keep the composition and proportions intact; apparently they all follow but to some extent. Still 4B's cloth change is not ok (also note lips). Klein 9Bkv still shows issue with the flower (too large and seems a copy paste of input!).
Test 3: **LLM Prompting**
[LLM prompting](https://preview.redd.it/hli11j9u78pg1.jpg?width=2048&format=pjpg&auto=webp&s=d57dc0bc2cdc40f307fc669a03b5f225b48cfdf6)
Given the previous (slightly longer prompt) and the input image to an LLM with visual or VLM and feeding the resulting essay-long-prompt to all of the three models, it appears that **all models were successful in all edits.** Interesting the results look very similar, even the backgrounds. Even the weak model 4B applied all of the edits properly, almost. However, looking closer at the hair forms it is clear that only 9B has kept the exact same hair form as in the original image.
So \*\*\* **Klein 9B is a clear winner. \*\*\***
Maybe with a book-long-prompt all of these models would generate exact edits.
Also note that, not all the time the LLM prompting would succeed. Dealing with the LLM itself is another challenge to master case by case. Nonetheless, pragmatically speaking, it seems most of multiple-edits-at-once issues could be addressed by long, repetitive statement as in LLM prompting tendency. (no claim on solving body horror issues present in all Klein models, BTW).
*TLDR: Prompt "high resolution image 1" instead of "upscale image 1" and use a bilinear upscale of your target image as both the reference image* ***and*** *your latent image, with a denoise of 0.7-0.9 Here is an* [*image with embedded workflow*](https://www.dropbox.com/scl/fi/p7bzsx65k8k9301wj9qrd/ComfyUI_UpScale_2026-02-26_00016_-Copy.png?rlkey=madj8a4tvhy80pq5q8e83maoy&st=4o2xlqz8&dl=0) *and here is the* [*workflow in PasteBin*](https://pastebin.com/JGUKN1H4)*.*
The [earlier post](https://www.reddit.com/r/StableDiffusion/comments/1rfm605/image_upscale_with_klein_9b/) was both right and "wrong" about upscaling with Flux 2 Klein 9B:
It's **right** that for many applications, using Klein is simpler and faster than something like SeedVR2, and avoids complicated workflows that rely on custom nodes.
But it's **wrong** about the way to do a Klein upscale—though, to be fair, I don't think they were claiming to be presenting the *best* Klein method. (Please stop jumping down OOPs throat.)
**Prompting**
The single easiest and most important change is to prompt "high resolution" instead of "upscale." Granted, there may be circumstances where this doesn't make much of a different or makes the resulting image worse. But in my tests, at least, it always resulted in a better upscale, with better details, less plastic texture, and decreased patterning and other AI upscale oddities.
My theory (and I think it's a good one) is that images labeled upscaled are exactly that: upscaled. They will inherently be worse than images that were high resolution originally, and will thus tend to contain all the artifacts we're accustomed to from earlier generations of upscalers. By specifying "high resolution" you are telling the model "Hey give this image the quality of a high res image" rather than "Hey give this the quality of something artificially upscaled."
I found that this method has a bit of a bias toward desaturation, but this might be a consequence of the relatively high-saturation starting images. Modern photos tend to be less punchy (especially for certain tones) so the model is likely biased toward a more muted, smartphone-esque look. On the other hand, it's possible that if you start with B&W or faded film images, this method might have a tendency to saturate—again pulling the image toward a contemporary digital look. You can address this with appropriate prompting like "Preserve exact color saturation and exposure from image 1".
**Use a simple upscale of the target image as Flux reference**
Additionally, use an initial 1 megapixel (MP) bilinear upspscale of your image as the Flux 2 reference. Flux 2 was designed to work at a base resolution of 1024x1024. So even if your simple upscale is not actually adding more detail, it means the model will still be able to get a better understanding of your starting image than if you feed it a suboptimal <1MP image. (You can try other upscalers but bilinear is cleanest when you're trying to preserve the original as much as possible. If you're trying to give a sharp/detailed look, you could try Lanczos, but it may introduce artifacts.)
**Use a simple upscale of the target image as your latent image**
Use the same initial 1MP upscale as your latent image. This helps give the model a starting point that gives it an additional boost to preserve various additional aspects of your image. I found that denoise from 0.7 to 0.9 works best (keep in mind that number of steps will impact exactly where different denoise thresholds lie). But note that different seeds can have different optimal denoise levels.
**Additional notes**
I have also included a second, model-based upscaling step in case you want to go up to 4MP. Beyond this, you probably will want to switch to a tiled and/or SeedVR2 method. It might be that I could incorporate more elements of my approach above into this simple step for even better results, but I'm honestly too lazy to try that right now.
I have not done a direct comparison to SeedVR2 because, candidly, I don't use it. I know it make me a curmudgeon, but I \*hate\* having to install/use custom nodes, both from a simplicity and security standpoint. From what I have seen of SeedVR2, I think this method is quite competitive; but I'm not married to that position since I can't make direct comparisons. If someone would like to try it, I'd be much obliged and might change my position if SeedVR2 still blows this approach out of the water.
Hi, all. I was just checking in to see if anyone knows if there are controlnet models around for Klein 9B. So far I've only been finding them for Flux 2 Dev, and I figured it was worth asking around before I go to the trouble of training my own.
Hey all, so a week ago I took a swipe at z-image as the loras I was creating did a meh job of image creation.
After the recent updates for z-image base training I decided to once again compare A Z-image Base trained Lora running on Z-image turbo vs a Flux Klein 9b Base trained Lora running on Flux Klein 9b
For reference the first of the 2 images is always z-image. I chose the best of 4 outputs for each - so I COULD do a better job with fiddling and fine tuning, but this is fairly representative of what I've been seeing.
Both are creating decent outputs - but there are some big differences I notice.
1. Klein 9b makes much more 'organic' feeling images to my eyes - if you want ot generate a lora and make it feel less like a professional photo, I found that Klein 9b really nails it. Z-image often looks more posed/professional even when I try to prompt around it. (especially look at the night club photo, and the hiking photo)
2. Klein 9b still does struggle a little more with structure.. extra limbs sometimes, not knowing what a motorcycle helmet is supposed to look like etc.
3. Klein 9b follow instructions better - I have to do fewer iterations with flux 9b to get exactly what I want.
4. Klein 9b maanges to show me in less idealised moments... less perfect facial expressions, less perfect hair etc. It has more facial variation - if I look at REAL images of myself, my face looks quite different depending on the lens used, the moment captured etc Klein nails this variation very well and makes teh images produced far more life-like: [https://drive.google.com/drive/folders/1rVN87p6Bt973tjb8G9QzNoNtFbh8coc0?usp=drive\_link](https://drive.google.com/drive/folders/1rVN87p6Bt973tjb8G9QzNoNtFbh8coc0?usp=drive_link)
Personally, Flux really hits the nail on the head for me. I do photography for clients (for instagram profiles and for dating profiles etc) - And I'm starting to offer AI packages for more range. Being able to pump out images that aren't overly flattering that feel real and authentic is a big deal.
For me right now, Flux 2 Klein and ZIT are my main choices for creating content. (In my case, I don’t use much illustration or anything too fantastical, just photography and movie stills). In all the images here I used Flux-2-klein-9b, with CFG set to 1, Euler Ancestral, 24 steps, and FULL HD resolution. The prompt for each image has no secret — I just detailed the colors, lighting, and objects. (no detailer or upscaler) Anyway, Klein is now part of my daily use here!
Be aware not use the model with "base" on its name.
WORKFLOW: [https://pastebin.com/KzfysWCL](https://pastebin.com/KzfysWCL)
Continue browsing adjacent models from the same provider.