Long Context Window
Supports a 2 million token context window, enabling processing of very long documents, codebases, or multi-turn conversations in a single request.
Grok 4 Fast is a text generation model developed by xAI, the AI division of X. It is built on learnings from Grok 4 and is designed to deliver high-quality reasoning at lower computational cost, using approximately 40% fewer thinking tokens on average compared to its full counterpart. The model features a 2 million token context window and supports both reasoning and non-reasoning modes within a single unified architecture. Grok 4 Fast is trained end-to-end with tool-use reinforcement learning, enabling it to handle agentic tasks such as web browsing, code execution, and real-time information synthesis. It accepts both text and image inputs and produces text output. The model is well-suited for developers and enterprises that need multi-step reasoning, long-context document processing, and real-time web research without the computational overhead of a full frontier model.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Grok 4 Fast.
Grok 4 Fast is a text generation model developed by xAI, the AI division of X. It is built on learnings from Grok 4 and is designed to deliver high-quality reasoning at lower computational cost, using approximately 40% fewer thinking tokens on average compared to its full counterpart. The model features a 2 million token context window and supports both reasoning and non-reasoning modes within a single unified architecture.
Grok 4 Fast is trained end-to-end with tool-use reinforcement learning, enabling it to handle agentic tasks such as web browsing, code execution, and real-time information synthesis. It accepts both text and image inputs and produces text output. The model is well-suited for developers and enterprises that need multi-step reasoning, long-context document processing, and real-time web research without the computational overhead of a full frontier model.
Supports a 2 million token context window, enabling processing of very long documents, codebases, or multi-turn conversations in a single request.
Offers both reasoning and non-reasoning modes in one unified architecture, allowing developers to choose the appropriate inference style per task.
Uses approximately 40% fewer thinking tokens on average than Grok 4, achieved through large-scale reinforcement learning optimized for intelligence density.
Trained end-to-end with tool-use reinforcement learning, supporting web browsing, code execution, and real-time information synthesis across multi-step tasks.
Accepts both text and image inputs, producing text output, making it usable for tasks that involve visual content alongside natural language.
The search-enabled variant (grok-4-fast-search) supports real-time web and X (Twitter) browsing, and ranked first on LMArena's Search Arena with an Elo score of 1163.
Scored 80.0% on LiveCodeBench (January–May evaluation window), reflecting strong performance on competitive programming and code synthesis tasks.
Achieved 92.0% on AIME 2025 and 93.3% on HMMT 2025 without tools, demonstrating strong performance on formal mathematical reasoning benchmarks.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Grok 4 Fast discussions are most active in r/singularity, r/SillyTavernAI, r/grok. Top Reddit threads cluster around benchmark and model-comparison threads.
The strongest match in this snapshot has 508 upvotes and 97 comments.
As we continue advancing Grok, we are retiring several earlier models to focus fully on our newest generation. **Effective May 15, 2026 at 12:00pm PT**, the following models will be retired from the xAI API:
* `grok-4-1-fast-reasoning`
* `grok-4-1-fast-non-reasoning`
* `grok-4-fast-reasoning`
* `grok-4-fast-non-reasoning`
* `grok-4-0709`
* `grok-code-fast-1`
* `grok-3`
* `grok-imagine-image-pro`
[`https://docs.x.ai/developers/migration/may-15-retirement`](https://docs.x.ai/developers/migration/may-15-retirement)
It seems that Grok-4-fast was created based on Jet-Nemotron architecture.
[https://arxiv.org/abs/2508.15884v1](https://arxiv.org/abs/2508.15884v1)
It allows to massively decrease the amount of compute needed for inference without sacrificing the model performance. It also allows for a much bigger context window since the price no longer scales quadratically but linearly!
**So basically:** everyone can implement this architecture without retraining. The price of models can be **Drastically** reduced without sacrificing accuracy much.
XAI did it first, but others will definitely follow (if they haven't already).
>There is a high chance that OpenAI has already done it:
A sudden slash in prices on o3 by 80% and then GPT-5-thinking being even cheaper in a very short period of time.
How can xAI afford to run such a model for so little?
Grok 4 Fast supports a 2 million token context window, which allows it to process very long documents, extended conversations, or large codebases within a single request.
Grok 4 Fast is a cost-efficient variant built on learnings from Grok 4. It uses approximately 40% fewer thinking tokens on average, making it less computationally expensive while targeting comparable reasoning quality.
Grok 4 Fast accepts both text and image inputs and produces text output. It also supports tool use including web browsing, X (Twitter) browsing, and code execution.
Based on the available metadata, the training date for Grok 4 Fast is listed as September 2025.
Grok 4 Fast is available through the xAI API. You can find API documentation and access details at x.ai/api. On MindStudio, no separate API key is required to use the model.
Yes. Grok 4 Fast supports both reasoning and non-reasoning modes within a single unified architecture, allowing developers to select the appropriate mode depending on the task.
Continue browsing adjacent models from the same provider.