Chain-of-Thought Reasoning
Generates an explicit reasoning trace before producing a final answer, allowing multi-step problems to be broken down systematically. This CoT process is visible in the model's output.
DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use. DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for DeepSeek-R1.
DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use.
DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.
Generates an explicit reasoning trace before producing a final answer, allowing multi-step problems to be broken down systematically. This CoT process is visible in the model's output.
Applies step-by-step reasoning to solve mathematical and logical problems, including proofs, equations, and structured inference tasks.
Produces and debugs code across common programming languages, using its reasoning process to work through algorithmic problems before outputting a solution.
Handles input and output sequences within a 64,000-token context window, supporting analysis of lengthy documents or extended multi-turn conversations.
Model weights are publicly released by DeepSeek, enabling local deployment and fine-tuning without relying solely on the hosted API.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
DeepSeek-R1 discussions are most active in r/LocalLLaMA, r/selfhosted, r/singularity.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 2136 upvotes and 684 comments.
I've recently seen some misconceptions that you can't run DeepSeek-R1 locally on your own device. Last weekend, we were busy trying to make you guys have the ability to run the actual R1 (non-distilled) model with just an RTX 4090 (24GB VRAM) which gives at least 2-3 tokens/second.
Over the weekend, we at Unsloth (currently a team of just 2 brothers) studied R1's architecture, then selectively quantized layers to 1.58-bit, 2-bit etc. which vastly outperforms basic versions with minimal compute.
1. We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great
2. No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.
3. Minimum requirements: a CPU with 20GB of RAM (but it will be very slow) - and 140GB of diskspace (to download the model weights)
4. Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)
5. No, you do not need hundreds of RAM+VRAM but if you have it, you can get **140 tokens per second** for throughput & 14 tokens/s for single user inference with 2xH100
6. Our open-source GitHub repo: [github.com/unslothai/unsloth](http://github.com/unslothai/unsloth)
Many people have tried running the dynamic GGUFs on their potato devices and it works very well (including mine).
R1 GGUFs uploaded to Hugging Face: [huggingface.co/unsloth/DeepSeek-R1-GGUF](http://huggingface.co/unsloth/DeepSeek-R1-GGUF)
To run your own R1 locally we have instructions + details: [unsloth.ai/blog/deepseekr1-dynamic](http://unsloth.ai/blog/deepseekr1-dynamic)
Source: Moneycontrol \[[Article Link](https://www.moneycontrol.com/news/business/startup/sarvam-ai-launches-30b-and-105b-models-says-105b-outperforms-deepseek-r1-and-gemini-flash-on-key-benchmarks-13834399.html)\]
>Bengaluru-based AI startup just announced the launch of two new large language models, a 30-billion-parameter model and a 105-billion-parameter model, both trained from scratch.
“At 105 billion parameters, on most benchmarks this model beats DeepSeek R1 released a year ago, which was a 600-billion-parameter model."
>“It is cheaper than something like a Gemini Flash, but outperforms it in many benchmarks,” Kumar said.
>On Indian language benchmarks, Kumar said the model delivers stronger performance than several larger competitors.
>“Even with something like Gemini 2.5 Flash, which is a bigger and more expensive model, we find that the Indian language performance of this model is even better.”
Sarvam was earlier announced as the first startup selected to build India’s foundational AI model under the mission.Article LinkBengaluru-based AI startup just announced the launch of two new large language models, a 30-billion-parameter model and a 105-billion-parameter model, both trained from scratch.
“At 105 billion parameters, on most benchmarks this model beats DeepSeek R1 released a year ago, which was a 600-billion-parameter model."It is cheaper than something like a Gemini Flash, but outperforms it in many benchmarks,” Kumar said. On Indian language benchmarks, Kumar said the model delivers stronger performance than several larger competitors. “Even with something like Gemini 2.5 Flash, which is a bigger and more expensive model, we find that the Indian language performance of this model is even better.”
Sarvam was earlier announced as the first startup selected to build India’s foundational AI model under the mission.
Hey r/LocalLLaMA! I managed to **dynamically quantize** the full DeepSeek R1 671B MoE to 1.58bits in GGUF format. The trick is **not to quantize all layers**, but quantize only the MoE layers to 1.5bit, and leave attention and other layers in 4 or 6bit.
|MoE Bits|Type|Disk Size|Accuracy|HF Link|
|:-|:-|:-|:-|:-|
|1.58bit|IQ1\_S|**131GB**|Fair|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_S)|
|1.73bit|IQ1\_M|**158GB**|Good|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_M)|
|2.22bit|IQ2\_XXS|**183GB**|Better|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ2_XXS)|
|2.51bit|Q2\_K\_XL|**212GB**|Best|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-Q2_K_XL)|
You can get **140 tokens / s** for throughput and 14 tokens /s for single user inference on 2x H100 80GB GPUs with all layers offloaded. A 24GB GPU like RTX 4090 should be able to get at least 1 to 3 tokens / s.
If we naively quantize all layers to 1.5bit (-1, 0, 1), the model will fail dramatically, since it'll produce **gibberish** and **infinite repetitions**. I selectively leave all attention layers in 4/6bit, and leave the first 3 transformer dense layers in 4/6bit. The MoE layers take up 88% of all space, so we can leave them in 1.5bit. We get in total a weighted sum of 1.58bits!
I asked it the 1.58bit model to create Flappy Bird with 10 conditions (like random colors, a best score etc), and it did pretty well! Using a generic non dynamically quantized model will fail miserably - there will be no output at all!
[Flappy Bird game made by 1.58bit R1](https://i.redd.it/k8nfun2ezjfe1.gif)
There's more details in the blog here: [https://unsloth.ai/blog/deepseekr1-dynamic](https://unsloth.ai/blog/deepseekr1-dynamic) The link to the 1.58bit GGUF is here: [https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1\_S](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_S) You should be able to run it in your favorite inference tool if it supports i matrix quants. No need to re-update llama.cpp.
A reminder on DeepSeek's chat template (for distilled versions as well) - it auto adds a BOS - do not add it manually!
`<|begin▁of▁sentence|><|User|>What is 1+1?<|Assistant|>It's 2.<|end▁of▁sentence|><|User|>Explain more!<|Assistant|>`
To know how many layers to offload to the GPU, I approximately calculated it as below:
|Quant|File Size|24GB GPU|80GB GPU|2x80GB GPU|
|:-|:-|:-|:-|:-|
|1.58bit|131GB|7|33|All layers 61|
|1.73bit|158GB|5|26|57|
|2.22bit|183GB|4|22|49|
|2.51bit|212GB|2|19|32|
All other GGUFs for R1 are here: [https://huggingface.co/unsloth/DeepSeek-R1-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-GGUF) There's also GGUFs and dynamic 4bit bitsandbytes quants and others for all other distilled versions (Qwen, Llama etc) at [https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5](https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5)
We knew R1 was good, but not that good. All the cries of CCP censorship are meaningless when it's trivial to bypass its guard rails.
DeepSeek-R1 supports a context window of 64,000 tokens, which covers both input and output combined.
DeepSeek-R1 generates a Chain of Thought (CoT) before delivering its final answer. This means the model works through reasoning steps explicitly, which is intended to improve accuracy on complex or multi-step tasks.
Based on the available metadata, DeepSeek-R1 was trained through late 2024. It does not have knowledge of events after that period.
Yes. DeepSeek released the model weights for DeepSeek-R1 publicly on Hugging Face, allowing users to run the model locally or fine-tune it independently of the hosted API.
DeepSeek-R1 is designed for tasks that benefit from structured reasoning, including mathematics, logic, coding, and scientific problem-solving. Its CoT approach makes it particularly useful when intermediate reasoning steps matter.
Continue browsing adjacent models from the same provider.