X.ai

Grok 4 Fast

Grok 4 Fast is a text generation model developed by xAI, the AI division of X. It is built on learnings from Grok 4 and is designed to deliver high-quality reasoning at lower computational cost, using approximately 40% fewer thinking tokens on average compared to its full counterpart. The model features a 2 million token context window and supports both reasoning and non-reasoning modes within a single unified architecture. Grok 4 Fast is trained end-to-end with tool-use reinforcement learning, enabling it to handle agentic tasks such as web browsing, code execution, and real-time information synthesis. It accepts both text and image inputs and produces text output. The model is well-suited for developers and enterprises that need multi-step reasoning, long-context document processing, and real-time web research without the computational overhead of a full frontier model.

September 2025 N/A context 2,000,000 tokens output
Long Context Window Reasoning Modes Token Efficiency Agentic Tool Use Multimodal Input Web & Search Integration

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

X.ai

Input Context Window

The number of tokens supported by the input context window.

N/A tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

2,000,000 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

September 2025

Knowledge Cut-off Date

When the model's knowledge was last updated.

September 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

xAI API, OpenAI API

Modalities

Types of data this model can process.

Text Image Code

What is Grok 4 Fast

A fuller summary of positioning, capabilities, and source-specific details for Grok 4 Fast.

Grok 4 Fast is a text generation model developed by xAI, the AI division of X. It is built on learnings from Grok 4 and is designed to deliver high-quality reasoning at lower computational cost, using approximately 40% fewer thinking tokens on average compared to its full counterpart. The model features a 2 million token context window and supports both reasoning and non-reasoning modes within a single unified architecture.

Grok 4 Fast is trained end-to-end with tool-use reinforcement learning, enabling it to handle agentic tasks such as web browsing, code execution, and real-time information synthesis. It accepts both text and image inputs and produces text output. The model is well-suited for developers and enterprises that need multi-step reasoning, long-context document processing, and real-time web research without the computational overhead of a full frontier model.

Capabilities

What Grok 4 Fast supports

CTX

Long Context Window

Supports a 2 million token context window, enabling processing of very long documents, codebases, or multi-turn conversations in a single request.

RN

Reasoning Modes

Offers both reasoning and non-reasoning modes in one unified architecture, allowing developers to choose the appropriate inference style per task.

AI

Token Efficiency

Uses approximately 40% fewer thinking tokens on average than Grok 4, achieved through large-scale reinforcement learning optimized for intelligence density.

AG

Agentic Tool Use

Trained end-to-end with tool-use reinforcement learning, supporting web browsing, code execution, and real-time information synthesis across multi-step tasks.

MM

Multimodal Input

Accepts both text and image inputs, producing text output, making it usable for tasks that involve visual content alongside natural language.

AI

Web & Search Integration

The search-enabled variant (grok-4-fast-search) supports real-time web and X (Twitter) browsing, and ranked first on LMArena's Search Arena with an Elo score of 1163.

</>

Code Generation

Scored 80.0% on LiveCodeBench (January–May evaluation window), reflecting strong performance on competitive programming and code synthesis tasks.

RN

Math & Science Reasoning

Achieved 92.0% on AIME 2025 and 93.3% on HMMT 2025 without tools, demonstrating strong performance on formal mathematical reasoning benchmarks.

Pricing for Grok 4 Fast

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1
maxResponseSize 2,000,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

xAI API OpenAI API

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
60.6%
HLE
Questions that challenge frontier models across many domains
5.0%
LiveCodeBench
Real-world coding tasks from recent competitions
40.1%
MMLU-Pro
Expert knowledge across 14 academic disciplines
73.0%
SciCode
Scientific research coding and numerical methods
32.9%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Grok 4 Fast

Grok 4 Fast discussions are most active in r/singularity, r/SillyTavernAI, r/grok. Top Reddit threads cluster around benchmark and model-comparison threads.

The strongest match in this snapshot has 508 upvotes and 97 comments.

As we continue advancing Grok, we are retiring several earlier models to focus fully on our newest generation. **Effective May 15, 2026 at 12:00pm PT**, the following models will be retired from the xAI API:

* `grok-4-1-fast-reasoning`
* `grok-4-1-fast-non-reasoning`
* `grok-4-fast-reasoning`
* `grok-4-fast-non-reasoning`
* `grok-4-0709`
* `grok-code-fast-1`
* `grok-3`
* `grok-imagine-image-pro`

[`https://docs.x.ai/developers/migration/may-15-retirement`](https://docs.x.ai/developers/migration/may-15-retirement)

Open Reddit thread

It seems that Grok-4-fast was created based on Jet-Nemotron architecture.

[https://arxiv.org/abs/2508.15884v1](https://arxiv.org/abs/2508.15884v1)

It allows to massively decrease the amount of compute needed for inference without sacrificing the model performance. It also allows for a much bigger context window since the price no longer scales quadratically but linearly!

**So basically:** everyone can implement this architecture without retraining. The price of models can be **Drastically** reduced without sacrificing accuracy much.

XAI did it first, but others will definitely follow (if they haven't already).

>There is a high chance that OpenAI has already done it:

A sudden slash in prices on o3 by 80% and then GPT-5-thinking being even cheaper in a very short period of time.

Open Reddit thread
View more discussions →
FAQ

Common questions about Grok 4 Fast

What is the context window size for Grok 4 Fast?

Grok 4 Fast supports a 2 million token context window, which allows it to process very long documents, extended conversations, or large codebases within a single request.

How does Grok 4 Fast differ from Grok 4?

Grok 4 Fast is a cost-efficient variant built on learnings from Grok 4. It uses approximately 40% fewer thinking tokens on average, making it less computationally expensive while targeting comparable reasoning quality.

What input types does Grok 4 Fast support?

Grok 4 Fast accepts both text and image inputs and produces text output. It also supports tool use including web browsing, X (Twitter) browsing, and code execution.

What is the training data cutoff for Grok 4 Fast?

Based on the available metadata, the training date for Grok 4 Fast is listed as September 2025.

Where can I access the Grok 4 Fast API?

Grok 4 Fast is available through the xAI API. You can find API documentation and access details at x.ai/api. On MindStudio, no separate API key is required to use the model.

Does Grok 4 Fast support reasoning mode?

Yes. Grok 4 Fast supports both reasoning and non-reasoning modes within a single unified architecture, allowing developers to select the appropriate mode depending on the task.

More models from X.ai

Continue browsing adjacent models from the same provider.

← All AI Models