Text-to-Image Generation
Generates images from text prompts with up to 10,000 tokens, enabling detailed and complex scene descriptions.
Imagen 4 Ultra is Google's flagship image generation model and the top tier of the Imagen 4 family, trained through early 2025. It accepts text prompts of up to 10,000 tokens and is designed to handle complex, multi-element descriptions including specific art styles, multi-scene compositions, and nuanced visual storytelling. The model supports image URL arrays as input, allowing users to reference existing images alongside text prompts. It is licensed for commercial use, making it available to businesses and creative professionals working on production-grade projects. Imagena 4 Ultra is best suited for use cases where image fidelity and detail are priorities, such as professional design work, advertising, and high-resolution visual content creation. It covers a wide range of output styles, from photorealistic portraits and landscapes to stylized illustrations and pixel art. According to community benchmarking discussions, Imagen 4 Ultra has achieved competitive Elo ratings in image arenas, including a reported tie with GPT-Image-1 in the Image Arena as of mid-2025. The model is accessible via the Google Gemini API as well as third-party inference platforms such as fal.ai.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Imagen 4 Ultra.
Imagen 4 Ultra is Google's flagship image generation model and the top tier of the Imagen 4 family, trained through early 2025. It accepts text prompts of up to 10,000 tokens and is designed to handle complex, multi-element descriptions including specific art styles, multi-scene compositions, and nuanced visual storytelling. The model supports image URL arrays as input, allowing users to reference existing images alongside text prompts. It is licensed for commercial use, making it available to businesses and creative professionals working on production-grade projects.
Imagena 4 Ultra is best suited for use cases where image fidelity and detail are priorities, such as professional design work, advertising, and high-resolution visual content creation. It covers a wide range of output styles, from photorealistic portraits and landscapes to stylized illustrations and pixel art. According to community benchmarking discussions, Imagen 4 Ultra has achieved competitive Elo ratings in image arenas, including a reported tie with GPT-Image-1 in the Image Arena as of mid-2025. The model is accessible via the Google Gemini API as well as third-party inference platforms such as fal.ai.
Generates images from text prompts with up to 10,000 tokens, enabling detailed and complex scene descriptions.
Accepts arrays of image URLs as input, allowing reference images to be passed alongside text prompts for guided generation.
Supports a select input type for specifying output styles, covering photorealistic, illustrated, and stylized visual modes.
Licensed for commercial applications, making generated images usable in business and professional production contexts.
Produces high-resolution images suited for professional and commercial use cases where detail and fidelity are required.
Available via the Google Gemini API and third-party platforms like fal.ai, with documented endpoints for programmatic integration.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
If you want to edit an existing image, provide the URL(s) or variables
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Imagen 4 Ultra discussions are most active in r/Bard, r/GeminiAI, r/singularity. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.
The strongest match in this snapshot has 347 upvotes and 62 comments.
Alibaba has officially ended 2025 by releasing **Qwen-Image-2512**, currently the world’s strongest open-source text-to-image model. Benchmarks from the AI Arena confirm it is now performing within the same tier as Google’s flagship proprietary models.
**The Performance Data:** In over 10,000 blind evaluation rounds, **Qwen-Image-2512** effectively matching Imagen 4 Ultra and challenging **Gemini 3 Pro.**
This is the **first time** an open-source weights model has consistently rivaled the top three closed-source giants in visual fidelity.
**Key Upgrades:**
**Skin & Hair Realism:** The model features a specific architectural update to reduce the **"AI plastic look"** focusing on natural skin pores and realistic hair textures.
**Complex Material Rendering:** Significant improvements in difficult-to-render textures like water ripples, landscapes and animal fur.
**Layout & Text Quality:** Building on the Qwen-VL foundation, it handles multi-line text and professional-grade layout composition with high precision.
**Open Weights Availability:** True to their roadmap, Alibaba has open-sourced the model **weights** under the Apache 2.0 license, making them available on Hugging Face and ModelScope for immediate local deployment.
[Source: Qwen Blog](https://qwen.ai/blog?id=qwen-image-2512)
[Source: Hugging Face Repository](https://huggingface.co/unsloth/Qwen-Image-2512-GGUF)
Google's Imagen 4 and Imagen 4 Ultra are being sunset on June 30 but are essentially the only models out there that can reliably output a convincing 1990s "Disney renaissance" look, with the blurry-edge shading that defines the [CAPS](https://en.wikipedia.org/wiki/Computer_Animation_Production_System)\-style of that era. So I'm trying to distill it into something that can be used until I come across another model that can do this.
I've made my first Illustrious 2.0 LoRA (through TensorArt because my graphics card is busted and I already had an account with them since before they started censoring everything) with a purely Imagen 4-generated 100 image dataset of 16:9, 1408x768 graphics. I did Repeat 3 / Epoch 10 = 2910 steps. Auto-labelled with "wd-v1-4-vit-tagger-v2". And the resulting images absolutely do capture the style, but... the result is a little wonky, it's got random artifacts, often shitty lines, weird eyes, IDK, the way AI gen looked like 2 years ago? Back when "AI slop" didn't mean it looked too polished, but that it actually looked sloppy?
It'd be easy to just jump back in and add more images, do more steps, but I've already wasted nearly $10 so I'd be so thankful if somebody with more experience could hint what I might be doing wrong. Should I use Imagen 4 ultra images for training instead? They tend to be a little sharper and I can get at 2x the resolution, though they cost $0.06 per image. Or should I try and automate some de-noising or upscaling or sharpening of the training set I already have? Or is like... my LoRA essentially fine and what is vexing me is just the limitations of using an older local model like Illustrious 2.0?
Edit: also tried doing a Qwen Image Edit 2511 LoRA (through FAL's trainer) that would just change the character but the results were not great there either)
EDIT2: After a lot of back and forth I realized what's bothering me is probably just that Illustrious is a very out of date model that's pretty far behind the curve. I re-evaluaed my Qwen Image Edit 2511 LoRA and while it does also edit the background (despite me not touching the backgrounds at all in the pairs!) it's actually really good for getting the character design right, so I guess I'll just fix the backgrounds manually instead.
Imagen 4 Ultra supports a context window of 10,000 tokens, which applies to the text prompt input describing the desired image.
Yes, Imagen 4 Ultra is licensed for commercial applications, making it suitable for businesses and creative professionals producing commercial content.
According to the model metadata, Imagen 4 Ultra has a training data cutoff of early 2025.
Pricing for Imagen 4 Ultra via the Google Gemini API is listed on the Google Gemini API pricing page at ai.google.dev/gemini-api/docs/pricing#imagen.
Imagen 4 Ultra accepts image URL arrays and select-type inputs, in addition to text prompts, allowing users to provide reference images and specify style options alongside their descriptions.
Continue browsing adjacent models from the same provider.