DeepSeek

DeepSeek-R1

DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use. DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.

Jan 22, 2025 64,000 context 8,000 tokens output
Chain-of-Thought Reasoning Math & Logic Code Generation Long-Context Processing Open Weights Access

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

DeepSeek

Input Context Window

The number of tokens supported by the input context window.

64,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

8,000 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Jan 22, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2024

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepSeek API, OpenAI API, Anthropic API, Hugging Face

Modalities

Types of data this model can process.

Text

What is DeepSeek-R1

A fuller summary of positioning, capabilities, and source-specific details for DeepSeek-R1.

DeepSeek-R1 is a text generation model developed by DeepSeek, a Chinese AI company. It is a reasoning-focused model that generates a Chain of Thought (CoT) before producing a final answer, a technique designed to improve accuracy on multi-step problems. The model was trained through late 2024 and supports a context window of 64,000 tokens. DeepSeek released the model weights publicly, making it available for local deployment and research use.

DeepSeek-R1 is well suited for tasks that benefit from structured reasoning, such as mathematics, logic puzzles, coding challenges, and scientific problem-solving. Because the model externalizes its reasoning steps before answering, users can inspect the thought process that led to a given response. DeepSeek also released a series of distilled versions of R1 based on smaller base models, broadening its accessibility across different hardware configurations.

Capabilities

What DeepSeek-R1 supports

RN

Chain-of-Thought Reasoning

Generates an explicit reasoning trace before producing a final answer, allowing multi-step problems to be broken down systematically. This CoT process is visible in the model's output.

AI

Math & Logic

Applies step-by-step reasoning to solve mathematical and logical problems, including proofs, equations, and structured inference tasks.

</>

Code Generation

Produces and debugs code across common programming languages, using its reasoning process to work through algorithmic problems before outputting a solution.

CTX

Long-Context Processing

Handles input and output sequences within a 64,000-token context window, supporting analysis of lengthy documents or extended multi-turn conversations.

AI

Open Weights Access

Model weights are publicly released by DeepSeek, enabling local deployment and fine-tuning without relying solely on the hosted API.

Pricing for DeepSeek-R1

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 2
maxResponseSize 8,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepSeek API OpenAI API Anthropic API Hugging Face

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
AIME 2024
American math olympiad problems
89.3%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
81.3%
HLE
Questions that challenge frontier models across many domains
14.9%
LiveCodeBench
Real-world coding tasks from recent competitions
77.0%
MATH-500
Undergraduate and competition-level math problems
98.3%
MMLU-Pro
Expert knowledge across 14 academic disciplines
84.9%
SciCode
Scientific research coding and numerical methods
40.3%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about DeepSeek-R1

DeepSeek-R1 discussions are most active in r/LocalLLaMA, r/selfhosted, r/singularity.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 2136 upvotes and 684 comments.

r/selfhosted 2,136 upvotes 684 comments January 28, 2025
Yes, you can run DeepSeek-R1 locally on your device (20GB RAM min.)

I've recently seen some misconceptions that you can't run DeepSeek-R1 locally on your own device. Last weekend, we were busy trying to make you guys have the ability to run the actual R1 (non-distilled) model with just an RTX 4090 (24GB VRAM) which gives at least 2-3 tokens/second.

Over the weekend, we at Unsloth (currently a team of just 2 brothers) studied R1's architecture, then selectively quantized layers to 1.58-bit, 2-bit etc. which vastly outperforms basic versions with minimal compute.

1. We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great
2. No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.
3. Minimum requirements: a CPU with 20GB of RAM (but it will be very slow) - and 140GB of diskspace (to download the model weights)
4. Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)
5. No, you do not need hundreds of RAM+VRAM but if you have it, you can get **140 tokens per second** for throughput & 14 tokens/s for single user inference with 2xH100
6. Our open-source GitHub repo: [github.com/unslothai/unsloth](http://github.com/unslothai/unsloth)

Many people have tried running the dynamic GGUFs on their potato devices and it works very well (including mine).

R1 GGUFs uploaded to Hugging Face: [huggingface.co/unsloth/DeepSeek-R1-GGUF](http://huggingface.co/unsloth/DeepSeek-R1-GGUF)

To run your own R1 locally we have instructions + details: [unsloth.ai/blog/deepseekr1-dynamic](http://unsloth.ai/blog/deepseekr1-dynamic)

Open Reddit thread

Source: Moneycontrol \[[Article Link](https://www.moneycontrol.com/news/business/startup/sarvam-ai-launches-30b-and-105b-models-says-105b-outperforms-deepseek-r1-and-gemini-flash-on-key-benchmarks-13834399.html)\]

>Bengaluru-based AI startup just announced the launch of two new large language models, a 30-billion-parameter model and a 105-billion-parameter model, both trained from scratch.

“At 105 billion parameters, on most benchmarks this model beats DeepSeek R1 released a year ago, which was a 600-billion-parameter model."

>“It is cheaper than something like a Gemini Flash, but outperforms it in many benchmarks,” Kumar said.

>On Indian language benchmarks, Kumar said the model delivers stronger performance than several larger competitors.

>“Even with something like Gemini 2.5 Flash, which is a bigger and more expensive model, we find that the Indian language performance of this model is even better.”

Sarvam was earlier announced as the first startup selected to build India’s foundational AI model under the mission.Article LinkBengaluru-based AI startup just announced the launch of two new large language models, a 30-billion-parameter model and a 105-billion-parameter model, both trained from scratch.

“At 105 billion parameters, on most benchmarks this model beats DeepSeek R1 released a year ago, which was a 600-billion-parameter model."It is cheaper than something like a Gemini Flash, but outperforms it in many benchmarks,” Kumar said. On Indian language benchmarks, Kumar said the model delivers stronger performance than several larger competitors. “Even with something like Gemini 2.5 Flash, which is a bigger and more expensive model, we find that the Indian language performance of this model is even better.”

Sarvam was earlier announced as the first startup selected to build India’s foundational AI model under the mission.

Open Reddit thread
r/LocalLLaMA 1,691 upvotes 598 comments January 27, 2025
1.58bit DeepSeek R1 - 131GB Dynamic GGUF

Hey r/LocalLLaMA! I managed to **dynamically quantize** the full DeepSeek R1 671B MoE to 1.58bits in GGUF format. The trick is **not to quantize all layers**, but quantize only the MoE layers to 1.5bit, and leave attention and other layers in 4 or 6bit.

|MoE Bits|Type|Disk Size|Accuracy|HF Link|
|:-|:-|:-|:-|:-|
|1.58bit|IQ1\_S|**131GB**|Fair|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_S)|
|1.73bit|IQ1\_M|**158GB**|Good|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_M)|
|2.22bit|IQ2\_XXS|**183GB**|Better|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ2_XXS)|
|2.51bit|Q2\_K\_XL|**212GB**|Best|[Link](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-Q2_K_XL)|

You can get **140 tokens / s** for throughput and 14 tokens /s for single user inference on 2x H100 80GB GPUs with all layers offloaded. A 24GB GPU like RTX 4090 should be able to get at least 1 to 3 tokens / s.

If we naively quantize all layers to 1.5bit (-1, 0, 1), the model will fail dramatically, since it'll produce **gibberish** and **infinite repetitions**. I selectively leave all attention layers in 4/6bit, and leave the first 3 transformer dense layers in 4/6bit. The MoE layers take up 88% of all space, so we can leave them in 1.5bit. We get in total a weighted sum of 1.58bits!

I asked it the 1.58bit model to create Flappy Bird with 10 conditions (like random colors, a best score etc), and it did pretty well! Using a generic non dynamically quantized model will fail miserably - there will be no output at all!

[Flappy Bird game made by 1.58bit R1](https://i.redd.it/k8nfun2ezjfe1.gif)

There's more details in the blog here: [https://unsloth.ai/blog/deepseekr1-dynamic](https://unsloth.ai/blog/deepseekr1-dynamic) The link to the 1.58bit GGUF is here: [https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1\_S](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-IQ1_S) You should be able to run it in your favorite inference tool if it supports i matrix quants. No need to re-update llama.cpp.

A reminder on DeepSeek's chat template (for distilled versions as well) - it auto adds a BOS - do not add it manually!

`<|begin▁of▁sentence|><|User|>What is 1+1?<|Assistant|>It's 2.<|end▁of▁sentence|><|User|>Explain more!<|Assistant|>`

To know how many layers to offload to the GPU, I approximately calculated it as below:

|Quant|File Size|24GB GPU|80GB GPU|2x80GB GPU|
|:-|:-|:-|:-|:-|
|1.58bit|131GB|7|33|All layers 61|
|1.73bit|158GB|5|26|57|
|2.22bit|183GB|4|22|49|
|2.51bit|212GB|2|19|32|

All other GGUFs for R1 are here: [https://huggingface.co/unsloth/DeepSeek-R1-GGUF](https://huggingface.co/unsloth/DeepSeek-R1-GGUF) There's also GGUFs and dynamic 4bit bitsandbytes quants and others for all other distilled versions (Qwen, Llama etc) at [https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5](https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5)

Open Reddit thread
View more discussions →
FAQ

Common questions about DeepSeek-R1

What is the context window for DeepSeek-R1?

DeepSeek-R1 supports a context window of 64,000 tokens, which covers both input and output combined.

What makes DeepSeek-R1 different from a standard text generation model?

DeepSeek-R1 generates a Chain of Thought (CoT) before delivering its final answer. This means the model works through reasoning steps explicitly, which is intended to improve accuracy on complex or multi-step tasks.

What is the training data cutoff for DeepSeek-R1?

Based on the available metadata, DeepSeek-R1 was trained through late 2024. It does not have knowledge of events after that period.

Is DeepSeek-R1 available as open weights?

Yes. DeepSeek released the model weights for DeepSeek-R1 publicly on Hugging Face, allowing users to run the model locally or fine-tune it independently of the hosted API.

What types of tasks is DeepSeek-R1 best suited for?

DeepSeek-R1 is designed for tasks that benefit from structured reasoning, including mathematics, logic, coding, and scientific problem-solving. Its CoT approach makes it particularly useful when intermediate reasoning steps matter.

More models from DeepSeek

Continue browsing adjacent models from the same provider.

← All AI Models