Massive Context Window
Processes up to 2 million tokens in a single pass, enabling ingestion of entire codebases, lengthy documents, or extended conversation histories without truncation.
Grok 4.20 is a text generation model developed by xAI, the AI division of X. This variant is specifically configured with reasoning disabled, meaning it skips the extended chain-of-thought process to deliver faster, lower-latency responses while still operating on the full Grok 4.20 architecture. It supports a context window of up to 2 million tokens, allowing it to ingest very long documents, large codebases, or extended conversation histories in a single pass. The model was made available via API in March 2026 as part of the Grok 4.20 Beta family, which also includes reasoning-enabled and multi-agent-tuned variants. This model is designed for agentic and tool-centric workflows where response speed is a priority over deep step-by-step reasoning. It is well-suited for automated pipelines, coding agents, data-processing tasks, and any application where the model needs to call external tools rapidly and reliably. Its instruction-following behavior is tuned for consistency, making outputs predictable across repeated or templated prompts. Developers building low-latency AI systems or integrating LLM capabilities into production pipelines are the primary intended audience.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Grok 4.20.
Grok 4.20 is a text generation model developed by xAI, the AI division of X. This variant is specifically configured with reasoning disabled, meaning it skips the extended chain-of-thought process to deliver faster, lower-latency responses while still operating on the full Grok 4.20 architecture. It supports a context window of up to 2 million tokens, allowing it to ingest very long documents, large codebases, or extended conversation histories in a single pass. The model was made available via API in March 2026 as part of the Grok 4.20 Beta family, which also includes reasoning-enabled and multi-agent-tuned variants.
This model is designed for agentic and tool-centric workflows where response speed is a priority over deep step-by-step reasoning. It is well-suited for automated pipelines, coding agents, data-processing tasks, and any application where the model needs to call external tools rapidly and reliably. Its instruction-following behavior is tuned for consistency, making outputs predictable across repeated or templated prompts. Developers building low-latency AI systems or integrating LLM capabilities into production pipelines are the primary intended audience.
Processes up to 2 million tokens in a single pass, enabling ingestion of entire codebases, lengthy documents, or extended conversation histories without truncation.
Optimized for rapid and reliable external tool invocation, making it suitable for automated agent frameworks and multi-step pipelines.
Reasoning is disabled by design, reducing latency by skipping extended chain-of-thought processing while retaining the underlying model's generation capabilities.
Tuned for strong prompt adherence, producing consistent and predictable outputs across templated or repeated instructions.
Accepts input types beyond plain text, supporting diverse real-world task formats within a single model interface.
Generates coherent, contextually grounded text responses across a wide range of domains including coding, data processing, and conversational tasks.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Grok 4.20 discussions are most active in r/singularity, r/grok, r/LovingAI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 1779 upvotes and 451 comments.
I haven't tried it much at all but doing a little bit of testing so far it seems... Decent, maybe even pretty good though I'll need to test more.
So far I can definitely say that the multi agent version is definetly better in understanding everything that's going on in context and stuff but alot more costly, like a lot, and it kinda makes the characters sound like robots to be honest. It was also a lot more unhinged I feel compared to the normal one.
I also find that it has really good prompt adherence atleast in the following case as I have a small section in my prompt that basically says "Stop the roleplay or redirect it if you feel the characters are going ooc and address your concerns ooc" or whatever. And sometimes when I'm messing with bots I intentionally put them in a ooc scenario, more just for fun that legit roleplay and where every other model tends to just go with it forcing the character to be and act ooc so far the multi-agent version of grok actually either stops the roleplay completely or begins to push the roleplay in a more in character direction and informing me ooc, I think that could definitely be taken as a positive and a negative depending on your preference but I think it's cool that it actually acknowledges this, I'm hoping that means it's overall prompt adherence is quite good.
I'll probably do a bit more testing tonight but I'm just curious what's the general consensus so far?
source: @JasonBotterill
He tagged this post so it must be 4.20 https://x.com/prashant_1722/status/1954585422555992254?s=46
Grok 4.20 supports a context window of up to 2 million tokens, allowing it to process very long inputs in a single request.
Reasoning is disabled to reduce response latency. This makes the model faster and more suitable for agentic or tool-calling workflows where speed is prioritized over extended step-by-step reasoning.
According to the model metadata, the training date is listed as March 2026.
Grok 4.20 is published by xAI, the AI division associated with X (formerly Twitter).
This model is best suited for low-latency agentic systems such as automated assistants, coding agents, and data-processing pipelines where fast tool-calling and instruction adherence are more important than deep reasoning.
Yes, Grok 4.20 Beta models were released via API in March 2026, as reflected in the model's dateAdded metadata.
Continue browsing adjacent models from the same provider.