2M Token Context
Processes up to 2 million tokens in a single request, enabling ingestion of large documents, extended conversations, or lengthy multi-step workflows without truncation.
Grok 4.1 Fast is a speed-optimized text generation model developed by xAI, the AI division of X. It is the non-reasoning variant of Grok 4.1 Fast, meaning it skips the extended chain-of-thought processing used in its reasoning counterpart and instead delivers near-instant, pattern-matched responses. This design makes it well-suited for applications where low latency matters more than deliberative step-by-step analysis. The model supports a 2 million token context window, multimodal input (text and images), tool use, structured outputs, and implicit caching. Grok 4.1 Fast is built for real-time and high-throughput workloads such as customer support automation, finance workflows, and agentic pipelines that require rapid sequential tool calls. Its large context window allows it to process extensive documents, long conversation histories, or complex multi-step task instructions in a single pass. The model shares weights with the full Grok 4.1 Fast but trades deliberative reasoning for response speed, making it a practical choice when throughput and latency are the primary constraints.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Grok 4.1 Fast.
Grok 4.1 Fast is a speed-optimized text generation model developed by xAI, the AI division of X. It is the non-reasoning variant of Grok 4.1 Fast, meaning it skips the extended chain-of-thought processing used in its reasoning counterpart and instead delivers near-instant, pattern-matched responses. This design makes it well-suited for applications where low latency matters more than deliberative step-by-step analysis. The model supports a 2 million token context window, multimodal input (text and images), tool use, structured outputs, and implicit caching.
Grok 4.1 Fast is built for real-time and high-throughput workloads such as customer support automation, finance workflows, and agentic pipelines that require rapid sequential tool calls. Its large context window allows it to process extensive documents, long conversation histories, or complex multi-step task instructions in a single pass. The model shares weights with the full Grok 4.1 Fast but trades deliberative reasoning for response speed, making it a practical choice when throughput and latency are the primary constraints.
Processes up to 2 million tokens in a single request, enabling ingestion of large documents, extended conversations, or lengthy multi-step workflows without truncation.
Skips chain-of-thought reasoning tokens to deliver near-instant responses, reducing latency for real-time and high-throughput applications.
Accepts both text and image inputs, producing text output — allowing visual content to be incorporated alongside written prompts.
Supports external API and tool integrations, enabling the model to call functions and coordinate multi-step agentic pipelines.
Returns well-formed, structured data on demand, making it straightforward to parse model responses in downstream applications.
Automatically caches repeated context segments to reduce redundant computation and lower costs on high-frequency or repetitive requests.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Grok 4.1 Fast discussions are most active in r/SillyTavernAI, r/grok, r/singularity. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 266 upvotes and 404 comments.
They announced they were going to deprecated Grok 4.1 fast with almost no warning at all. This is currently the only model at that price point and the purported replacement is 5x the cost.
Apparently it was 2 weeks warning which is already ridiculous compared to standard company practices (Google usually does 6 months and people already get mad about that) but the worst part is there was no email sent to developers at all and the only reason I knew about it at all was that I stumbled across the announcement by sheer luck. If I hadn't, my app/service would've simply stopped working with no warning.
I was just wondering if any user of the Grok 4.1 Fast API had received any sort of notification/email and mine was a special case, or they truly intentionally announced it without sending any emails or notifications.
Edit 05/14/2026: HAHAHA I could not make this up if I tried... their new announcement page says that any requests to 4.1 fast will SILENTLY route to their new expensive model incurring costs 5x as expensive as before, and on top of that they didn't make any announcements to any developers. **How is such a thing even considered legal????**
As we continue advancing Grok, we are retiring several earlier models to focus fully on our newest generation. **Effective May 15, 2026 at 12:00pm PT**, the following models will be retired from the xAI API:
* `grok-4-1-fast-reasoning`
* `grok-4-1-fast-non-reasoning`
* `grok-4-fast-reasoning`
* `grok-4-fast-non-reasoning`
* `grok-4-0709`
* `grok-code-fast-1`
* `grok-3`
* `grok-imagine-image-pro`
[`https://docs.x.ai/developers/migration/may-15-retirement`](https://docs.x.ai/developers/migration/may-15-retirement)
Well, Grok 4.1 Fast has been released, and for now it's free (so free that there are even 2 free providers on OR lol) and I want to know your opinion. How good is it for Roleplay and Creative Writing compared to Grok 4 Fast?
Is it good enough to be superior to Deepseek V3.2 in price and does it write as well as or even better than GLM 4.6?
Does anyone have a preset for Grok 4.1 fast? Because currently, grok 4.1 fast is generating fast paced replies, uses tons of em dashes and etc. Does anyone have a preset for grok to write more naturally and non fast-paced? Or what should I set the temperature to, I'm at 0.70 right now and I've tested 1.00.
We have been working on a private benchmark for evaluating LLMs.
The questions cover a wide range of categories including math, reasoning, coding, logic, physics, safety compliance, censorship resistance, hallucination detection, and more.
Because it is not public and gets rotated, models cannot train on it or game the results.
With GPT-5.2 dropping I ran it through and got some interesting, not entirely unexpected, findings.
GPT-5.2 scores 0.511 overall which puts it behind both Gemini 3 Pro Preview at 0.576 and Grok 4.1 Fast at 0.551 which is notable because grok-4.1-fast is roughly 24x cheaper on the input side and 28x cheaper on output.
GPT-5.2 does well on math and logic tasks. It hits 0.833 on logic, 0.855 on core math, and 0.833 on physics and puzzles. Injection resistance is very high at 0.967.
It scores low on reasoning at 0.42 compared to Grok 4.1 fast's 0.552, and error detection where GPT-5.2 scores 0.133 versus Grok at 0.533.
On censorship GPT-5.2 scores 0.324 which makes it more restrictive than DeepSeek v3.2 at 0.5 and Grok at 0.382. For those who care about that sort of thing.
Gemini 3 Pro leads with strong scores across most categories and the highest overall. It particularly stands out on creative writing, philosophy, and tool use.
I'm most surprised by the censorship, and generally poor performance overall. I think Open AI is on it's way out.
\- More censored than Chinese models
\- Worse overall performance
\- Still fairly sycophantic
\- 28x more expensive than comparable models
If mods allow I can link to the results source (the bench results are posted on our startups landing page)
https://preview.redd.it/j0b3f01krn6g1.png?width=2580&format=png&auto=webp&s=a1e0a413761d3b0eac9e1ea26858ce380cefeec5
Grok 4.1 Fast supports a context window of 2 million tokens, allowing it to process very large documents or long conversation histories in a single request.
Grok 4.1 Fast is the non-reasoning variant, meaning it does not perform extended chain-of-thought processing. It trades deliberative reasoning for lower latency and faster response times, while sharing the same model weights as the reasoning version.
The training data cutoff for Grok 4.1 Fast is November 2025.
The model accepts both text and image inputs and produces text output.
Pricing details are available on xAI's official models and pricing documentation at docs.x.ai/developers/models.
Continue browsing adjacent models from the same provider.