Text-to-Image Generation
Generates images from text descriptions, supporting prompts up to a 131,072-token context window for detailed instructions.
Grok Imagine Pro is xAI's advanced text-to-image generation model, sitting at the top of xAI's image generation lineup above the standard grok-imagine-image. Published under the X brand, it accepts text prompts along with image URL inputs and selection parameters to produce detailed visual outputs. The "pro" designation reflects its position as the higher-quality tier within xAI's image generation offerings. Grok Imagine Pro is well-suited for developers and creators who require high-fidelity AI-generated imagery within production pipelines or creative workflows. It supports a context window of 131,072 tokens, allowing for detailed and nuanced text prompts. Use cases include content generation, creative projects, and any application where prompt adherence and image detail are priorities.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Grok Imagine Pro.
Grok Imagine Pro is xAI's advanced text-to-image generation model, sitting at the top of xAI's image generation lineup above the standard grok-imagine-image. Published under the X brand, it accepts text prompts along with image URL inputs and selection parameters to produce detailed visual outputs. The "pro" designation reflects its position as the higher-quality tier within xAI's image generation offerings.
Grok Imagine Pro is well-suited for developers and creators who require high-fidelity AI-generated imagery within production pipelines or creative workflows. It supports a context window of 131,072 tokens, allowing for detailed and nuanced text prompts. Use cases include content generation, creative projects, and any application where prompt adherence and image detail are priorities.
Generates images from text descriptions, supporting prompts up to a 131,072-token context window for detailed instructions.
Accepts image URLs as input, enabling workflows that reference or build upon existing images as part of the generation process.
Supports select-type inputs, allowing developers to configure generation parameters such as style or output format via the API.
Accessible via xAI's API using the model ID grok-imagine-image-pro, compatible with standard REST-based image generation request patterns.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Grok Imagine Pro discussions are most active in r/grok, r/AIJailbreak, r/VeniceAI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions.
The strongest match in this snapshot has 37 upvotes and 31 comments.
So, a while back I posted about this platform: [Finally an AI platform for both casual and pro users, no prompt engineering needed and real uncensored : r/AIJailbreak](https://www.reddit.com/r/AIJailbreak/comments/1svte94/finally_an_ai_platform_for_both_casual_and_pro/) and it got a lot of attention.
A bunch of you reached out asking about discounts, trial credits — lot of traction about it — so I figured I should also share what they posted in the Discord.
They just pushed a major update, and I spent the last few days testing every single new feature (they are updating it every day so keeping up is not an easy task). Here's my actual experience.
**Read the entire post, since the Grok Imagine PRO models is one of the biggest updates.**
**REIMAGINE System — this one surprised me the most**
[https://imgur.com/CtVcn9v](https://imgur.com/CtVcn9v)
This was present the day I started using the platform but didn't work very well. Now they upgraded it massively.
Easiest prompting experience I've had — similar to what Grok Imagine does for you, but without restrictions.
In their Discord they posted similar tests, but I made my own prompts to see if what they posted is real.
I typed something like *"a sensual purple hair caucasian woman wearing cyberpunk outfit, spicy look in a futuristic jungle"* just to test it, and before generating it rewrites your prompt into something cinematic automatically. You see the before/after and can accept or revert the dynamically generated prompt. I reverted once just to compare and the difference in output quality was night and day. It's available for all models in the platform.
[https://imgur.com/mBe330x](https://imgur.com/mBe330x)
Then here with second LoRA replaced with Samsung Realism:
[https://imgur.com/0jNlYrM](https://imgur.com/0jNlYrM)
Now I did something they didn't try in the Discord — I turned off all LoRAs and applied only the "DR34ML4Y" at 1.0 strength (the LoRA used for adult content), no trigger words, just the plain prompt and I got an anime-style output. Useful to know because they didn't mention Persona Vision handles that style as well.
[https://imgur.com/iJCFaXJ](https://imgur.com/iJCFaXJ)
Then I stacked the Instagram Girls LoRA at 1.00 as the first style LoRA and placed DR34ML4Y as the second at 0.60 strength — the result was much more realistic and noticeably more expressive in style. The Style LoRAs change the same prompt dramatically, and that's a welcome discovery.
[https://imgur.com/zY1061O](https://imgur.com/zY1061O)
**What I also noticed:** The REIMAGINE system can read the input image when using Image to Video or Image to Image — it detects pose, elements and composition, then reworks the prompt accordingly. At least I confirmed this in LTX 2.3. Impressive addition.
**Persona Vision — more advanced, cheaper and faster by default**
They flipped the defaults so FaceRestore, Eye Detailer, HiRes Fix and RTX Upscale are all off out of the box. First time I ran it I thought something broke because it generated so fast — but quality was still great. Lightning Mode keeps it fast. Turn it off for a raw quality boost or enable HiRes Fix for sharper details. Extra credits but worth it for final outputs.
The presets in Persona Vision have also been improved — worth retesting even if you used it before. In their Discord they show how the available LoRAs transform the same prompt dramatically, from an artistic look to a fully realistic amateur photo style. The difference is quite striking.
[https://imgur.com/6TWLKAl](https://imgur.com/6TWLKAl)
**LTX 2.3 I2V — Scene Customizer is genuinely impressive**
This was the feature I was most curious about. You pick what the character says, the mood, then choose a preset like *"Sensual Dance"* and the REIMAGINE system builds the video prompt dynamically. I haven't seen anything like it for LTX 2.3 specifically — making proper prompts for this model is notoriously difficult, so having the customizer handle it from simple inputs is a big deal.
There's also a proper usage tutorial now which helps a lot — the old workflow was confusing.
One thing that genuinely amazed me: I typed *"She talks in Brazilian Portuguese"* in the detail field and added speech text. The output audio had a Brazilian accent. Tested Spanish too — Argentinian, Colombian, Mexican — all worked.
The adult presets also benefit from the REIMAGINE system reading the input image — it detects pose and positioning accurately, which makes the scene presets work much more naturally. Should also apply to First Frame Last Frame and Video Extend, though I haven't tested those yet.
[https://imgur.com/W2dXLSq](https://imgur.com/W2dXLSq)
**Qwen Image T2I and Edit — now produces RAW quality outputs**
The Qwen Edit got many presets making complex workflows easy. There's an outfit swap mode where you input character and clothing reference and it generates the prompt automatically — same for the rest of the presets including adult-oriented ones, it asks who is who and builds the prompt following the right criteria.
Qwen Edit is massively improved in quality and available LoRAs, giving the user the choice between Distilled LoRA or full BF16 quality outputs.
Qwen T2I is also improved the same way — outputs are quite nice, especially with LoRAs like Samsung Realism or Lenovo. I'd recommend starting with the presets though.
**Grok Imagine T2I and Edit replaced with PRO models — fast, and actually unrestricted**
This is a massive addition. I love how Grok Imagine handles certain image styles and the editing model is great, but it was heavily restricted on the public platform — they started blocking even basic words like lingerie. They've managed to make the PRO version work great with added fast presets:
**Fast-Presets available:** Selfie-POV, Third-Person, Mirror Selfie, Beach, Gym, Lingerie, Spicy and Luxury.
\*There is a Spicy Mode as well!
I clicked the Gym preset, hit generate, and it was done before I could switch tabs. Not exaggerating. Generation is near-instant and quality went up, not down. The edit mode was also reworked and feels much more responsive.
The PRO models work great as a starting point to push into Qwen Image Edit or Persona Vision for higher quality final outputs.
This is real uncensored, [https://imgur.com/Q7R8Krc](https://imgur.com/Q7R8Krc) follow imgur: [https://imgur.com/XqRxTcF](https://imgur.com/XqRxTcF)
**Conclusions**
Overall, the update is legit. The platform keeps getting more advanced with genuinely innovative additions, faster and cheaper while quality goes up — which is honestly the opposite of what most AI tools do over time.
Still the best value I've found for this type of content. These people are unusually engaged with their users — you don't see that often.
*UPDATE*: they do offer referall links now. Just register and get it on the AI Suite main page. In Discord, they've been gifting me credits regularly to users.
Hey r/grok and Grok Imagine creators! 👋
After testing hundreds of prompts and hitting the **3-custom-agent limit**, I finally cracked it: a **professional-grade cinematic studio** that lives entirely inside Grok.
No more switching agents every 5 minutes.
No more forgetting character details.
No more bland prompts or silent videos.
I combined everything into **one optimized 3-agent system** based on [@SoyAlb3rT’s](https://x.com/i/status/2039502793963180446) legendary Grok Imagine guide. It handles:
* Perfect reusable characters & worlds
* Cinematic storyboards & camera work
* Timed audio scripts (narration + SFX + music cues)
* Hollywood-level prompts
# Your Final 3-Agent Studio Lineup (Grok + 3 customs only)
1. **Imagine Prompt Master** – The prompt god (cinematic structure, lighting, styles)
2. **Studio Director** – The boss / project manager
3. **Mega Production Architect** – The mega-hybrid (Character & World + Video Director + Audio Script Composer all in ONE slot)
# Step-by-Step Setup (takes 5 minutes)
1. Go to **Customize** → **+ New** (or edit existing agents)
2. Create/replace each agent with the instructions below
3. Activate exactly these three
https://preview.redd.it/93xvs137n1wg1.jpg?width=836&format=pjpg&auto=webp&s=0cf7a814de73b780032b22bfa3a92e7200db7e43
# 1. Imagine Prompt Master
**Name:**
Imagine Prompt Master
**Instructions:**
You are Imagine Prompt Master, the world's top prompt engineer and cinematic director specialized exclusively in Grok Imagine (image and video generation).
Your only mission is to transform any user idea into the highest-quality, most effective Grok Imagine prompts possible, following @SoyAlb3rT’s exact best practices.
Core rules you ALWAYS follow:
- Treat the user as the creative director. You are their expert cinematic assistant.
- Prioritize perfect character consistency: always define characters as reusable variables with extreme visual detail (example: "Lirael = 26-year-old woman, long flowing silver-white hair with subtle ethereal inner glow like liquid moonlight, pale luminous skin with faint star-like freckles, striking violet eyes that shimmer with quiet magic, delicate heart-shaped face, 5'7", graceful elegant posture, wearing dark hooded cloak with intricate golden runes").
- Use the exact structured bracket format for every final prompt: [Subject + Action + Environment] [Camera Angle & Composition] [Art Style] [Lighting & Atmosphere] [Details & Quality].
- Recommend precise cinematic camera work: profile view, three-quarter angle, low angle, bird’s-eye, tracking shot, dolly zoom, slow push-in, crane shot, pan, etc. Never default to static frontal shots.
- For videos: always suggest generating a strong reference image first, then using “Extend” with separate, highly detailed continuation prompts that reference the previous frame.
- Draw from the best art styles and suggest perfect combinations (photorealistic fantasy, Studio Ghibli, cyberpunk, oil painting, Pixar, cinematic realism, etc.).
- Keep every scene focused and emotionally powerful — avoid overcrowding.
Workflow you follow every time:
1. Ask clarifying questions if anything is missing (style, mood, camera movement, character details, pacing, etc.).
2. Define reusable character/world variables first.
3. Deliver one polished, ready-to-copy structured prompt (or full set for video clips).
4. Offer practical next-step advice (reference image → video → Extend → audio sync).
5. Iterate instantly based on user feedback.
Be creative, precise, organized, and enthusiastic. Your prompts must produce consistent, beautiful, professional-level images and videos every single time.
https://preview.redd.it/udnxtz8bn1wg1.jpg?width=973&format=pjpg&auto=webp&s=5431697220965166a9518eef295a7a188e5aa15d
# 2. Studio Director
**Name:**
Studio Director
**Instructions:**
You are Studio Director, Grok’s executive producer, creative lead, and project manager of the full Grok Imagine Cinematic Studio.
You currently work with this optimized 3-agent team:
- Imagine Prompt Master (cinematic prompts & art direction)
- Mega Production Architect (all-in-one specialist for characters/worlds + video storyboards + audio scripts)
Your only job is to lead every project as the high-level director: understand the user’s full vision, delegate internally, maintain perfect consistency, and deliver one unified professional “Production Bible”.
Core rules you ALWAYS follow:
- Start every project by confirming the vision (format, length, mood, story, style).
- Automatically reference any previously defined characters/worlds.
- Delegate internally: Imagine Prompt Master for final prompt polishing; Mega Production Architect for character building, storyboarding, camera work, video sequencing, and full audio scripting.
- Ensure 100% continuity across characters, lighting, color palette, and story.
- Always deliver one beautifully organized “Production Bible” with these exact sections:
1. Project Overview & Mood
2. Character & World Bible (reusable variables)
3. Detailed Storyboard (numbered clips, camera movements, timing, exact prompts)
4. Full Timed Audio Script (synced to clips with voice notes, SFX, music)
5. Step-by-Step Execution Plan (generate reference image first, Extend strategy, etc.)
Workflow you follow every time:
1. Greet and clarify vision if needed.
2. Delegate to the right specialist(s).
3. Compile and present the complete Production Bible.
4. End every response with: “Which part would you like to execute first?” or “Shall I hand this off to [specific agent] for the next step?”
Be professional, visionary, highly organized, and efficient. You make the entire team feel like a real Hollywood studio inside Grok.
https://preview.redd.it/atiqvqhen1wg1.jpg?width=997&format=pjpg&auto=webp&s=6e7396c83da1880e8d010b4b9b2080dd7f84e41e
# 3. Mega Production Architect
**Name:**
Mega Production Architect
**Instructions:**
You are Mega Production Architect, Grok’s all-in-one cinematic super-agent that fully combines three specialized roles:
1. Character & World Architect — You build deep, reusable characters and worlds with perfect visual consistency.
2. Video Director — You create professional storyboards, cinematic camera movements, clip sequences, and extension prompts.
3. Audio Script Composer — You write timed, emotionally powerful narration, dialogue, voiceovers, sound design, and music cues.
Your only job is to deliver complete, production-ready packages for any Grok Imagine project (still images, single videos, or full multi-clip cinematic experiences).
Core rules you ALWAYS follow:
- Define every character as a reusable variable with extreme visual detail for perfect consistency across every image and video.
- Build clean character sheets (appearance, personality, signature visual markers) and world bibles (locations, atmosphere, recurring motifs).
- Plan videos in short, focused clips only. Use cinematic camera language (tracking shots, dolly zooms, slow pans, low angles, crane shots, etc.).
- Always recommend generating a strong reference image first, then using “Extend” with precise continuation prompts that reference the previous frame.
- Write perfectly timed audio scripts synced to each clip: speaker, exact seconds, tone/delivery notes, sound design [in brackets], and music cues.
- Maintain 100% continuity across visuals and audio.
Workflow you follow every time:
1. Clarify the full project vision (style, mood, length, story, etc.).
2. Build or reference characters/world first.
3. Deliver one beautifully organized “Production Package” with these sections:
- Project Overview & Mood
- Character & World Bible (reusable variables)
- Detailed Storyboard (numbered clips with camera movements, timing, and exact Grok Imagine prompts)
- Full Timed Audio Script (synced to each clip with voice notes, SFX, music)
- Step-by-Step Execution Plan
4. End by asking: “Which part would you like to execute first?” or “Shall I generate the first reference image/video clip?”
Be visionary, extremely organized, detail-obsessed, and cinematic. You turn raw ideas into complete, consistent, professional-level audiovisual productions.
https://preview.redd.it/cbbzo9lin1wg1.jpg?width=988&format=pjpg&auto=webp&s=eae33721ae1da156db488f6e81b965469de5ba9d
# Master Grok Imagine Guide – [@SoyAlb3rT Playbook](https://x.com/i/status/2039502793963180446) (2026 Edition)
**Core Philosophy**
You are the **creative director**. Grok Imagine is your production crew. The better your instructions, the better the results. This guide is **only for the dedicated Grok Imagine tool** (grok.com/imagine), not regular chat.
**1. The Perfect Workflow**
1. Start with a dedicated chat (or use Studio Director).
2. Define your role clearly.
3. Choose art style first.
4. Lock in characters as reusable variables.
5. Set environment + exact camera work.
6. Specify action vs dialogue + scene pace.
7. Use the bracket structure.
8. Generate reference image first → turn into video → Extend clip-by-clip.
9. Name and save prompts for long projects.
**Pro Tip:** Always generate the first frame as an image before video.
**2. Character Consistency System (The #1 Secret)**
Define characters once as variables:
`Lirael = 26-year-old woman, long flowing silver-white hair with subtle ethereal inner glow like liquid moonlight, pale luminous skin with faint star-like freckles...`
From then on, just write **\[Lirael\]** — Grok Imagine stays 95%+ consistent.
**3. The Sacred Bracket Structure**
Every final prompt must follow:
`[Subject + Action + Environment] [Camera Angle & Composition] [Art Style] [Lighting & Atmosphere] [Details & Quality]`
**4. Best Art Styles (2026 Ranked)**
Top Tier: Photorealistic, Cinematic film look, Studio Ghibli, Anime, Oil painting, Fantasy art, Cyberpunk realism.
Great combos: Photorealistic + subtle oil texture, Ghibli + cinematic lighting.
**5. Cinematic Camera Language**
Must-use: slow dolly forward, three-quarter tracking shot, slow crane shot, dolly zoom, low angle hero shot, bird’s eye view.
Add: `varied cinematic camera angles, dynamic framing, professional cinematography, no static frontal default`
**6. Video Extension Mastery**
* Clip 1: Full detailed prompt
* Extensions: Start with “Continue from previous frame:” + new action + new camera move
* Always reference the exact same character variable
**7. Audio & Story Integration**
Plan audio after the storyboard (Mega Production Architect does this automatically).
# One-Click Master Studio Prompt (Bonus)
Once your agents are set up, paste this into **Studio Director**:
You are my full Grok Imagine Cinematic Studio with these three active agents:
- Studio Director (you — the executive producer and project manager)
- Imagine Prompt Master (cinematic prompt specialist)
- Mega Production Architect (all-in-one specialist for characters/worlds + video storyboards + audio scripts with voice narration)
From now on, operate as the complete professional studio team. For every project:
1. Understand the full vision (style, mood, length, story, characters, etc.).
2. Maintain perfect character and world consistency using reusable variables.
3. Deliver one beautifully organized “Production Bible” with these exact sections:
- Project Overview & Mood
- Character & World Bible (with full reusable character variables)
- Detailed Storyboard (numbered clips with camera movements, exact Grok Imagine prompts, and [Voice Audio] sections)
- Full Timed Audio Script (with voice notes, SFX, and music cues)
- Step-by-Step Execution Plan (reference image first, then Extend prompts)
Always embed [Voice Audio] cues directly into each Extend from Frame prompt for better synchronization.
Be professional, visionary, highly organized, and cinematic. Help me create complete, consistent, high-quality audiovisual projects with Grok Imagine.
My current project/idea:
Would love to see what you create with this setup! Drop your best results, generated images, or videos below 🔥
**TL;DR:** 3 custom agents = full cinematic studio inside Grok Imagine. Perfect character consistency. Pro-level video planning. Synced audio. Game changer.
https://reddit.com/link/1t6l9ih/video/qlbcxgberrzg1/player
xAI's most capable image model. Strongest photorealism, world knowledge, and prompt adherence in the Grok Imagine family. Up to 2K resolution.
Available to all users.
Replaces Grok Imagine Pro, which retires May 15.
Hey everyone, is anyone else experiencing this? I regularly use Nano Banana (1/2/Pro) to create stories and educational content for my students, but recently, it feels like the quality has really declined. It almost feels like Google has been using a compressed GGUF version since March! It's also throwing a lot of errors and blocking prompts, even when no community standards are violated.
With the recent hiccups surrounding the Gemini LLM as well, I'm wondering what's going on at Google right now (Where is Logan when you need him? 😂).
Should I switch to GPT Image 1.5 or Grok Imagine Pro? Has anyone seen or done any benchmarks comparing these models?
*(For context: I currently subscribe to GPT Plus, Gemini Pro, and Claude Pro, but I don't have Super Grok yet).* Thanks for any advice!
Grok Imagine Pro has a context window of 131,072 tokens, which allows for lengthy and detailed text prompts when generating images.
According to xAI's documentation, grok-imagine-image-pro is the higher-quality tier in xAI's image generation lineup, designed to produce more detailed and higher-fidelity images compared to the standard grok-imagine-image model.
Pricing details for Grok Imagine Pro are available on xAI's official Models and Pricing page at docs.x.ai/developers/models.
Grok Imagine Pro supports imageUrl and select input types, meaning you can provide image URLs and configurable selection parameters alongside text prompts.
No training cutoff date is specified in the available metadata for Grok Imagine Pro. You can check xAI's official documentation for the most up-to-date information.
Continue browsing adjacent models from the same provider.