DeepSeek

DeepSeek V3.1

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks. What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Aug 21, 2025 128,000 context 8,000 tokens output
Hybrid Thinking Mode Long Context Window Tool Use & Agents Code Generation Mathematical Reasoning Mixture-of-Experts Architecture

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

DeepSeek

Model ID

The routed model identifier exposed by upstream providers.

deepseek/deepseek-chat-v3.1

Input Context Window

The number of tokens supported by the input context window.

128,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

8,000 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Aug 21, 2025 9 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2025-03-31

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, Novita, SiliconFlow, AtlasCloud, WandB, Google, SambaNova

Modalities

Types of data this model can process.

Text Code

What is DeepSeek V3.1

A fuller summary of positioning, capabilities, and source-specific details for DeepSeek V3.1.

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks.

What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Capabilities

What DeepSeek V3.1 supports

AI

Hybrid Thinking Mode

Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.

CTX

Long Context Window

Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.

AG

Tool Use & Agents

Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.

</>

Code Generation

Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.

RN

Mathematical Reasoning

Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.

AI

Mixture-of-Experts Architecture

Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.

Pricing for DeepSeek V3.1

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.13
maxTemperature 1
maxResponseSize 8,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra Novita SiliconFlow AtlasCloud WandB Google SambaNova

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 32,768 1d uptime: 99.7% Supported params: 17 Implicit caching: No

Novita

Max output: 32,768 1d uptime: 100.0% Supported params: 15 Implicit caching: No

SiliconFlow

Max output: 163,840 1d uptime: 98.1% Supported params: 11 Implicit caching: No

AtlasCloud

Max output: 65,536 1d uptime: 100.0% Supported params: 15 Implicit caching: No

WandB

Max output: 161,000 1d uptime: 99.9% Supported params: 15 Implicit caching: No

Google

Max output: 32,768 1d uptime: 99.4% Supported params: 15 Implicit caching: No

SambaNova

Max output: 7,168 1d uptime: 99.6% Supported params: 9 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
73.5%
HLE
Questions that challenge frontier models across many domains
6.3%
LiveCodeBench
Real-world coding tasks from recent competitions
57.7%
MMLU-Pro
Expert knowledge across 14 academic disciplines
83.3%
SciCode
Scientific research coding and numerical methods
36.7%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about DeepSeek V3.1

DeepSeek V3.1 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/DeepSeek.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 641 upvotes and 46 comments.

r/DeepSeek 394 upvotes 64 comments August 19, 2025
DeepSeek v3.1 already does better than ChatGPT-5. Change my mind.

No unnecessary hate but ChatGPTs will oftern provide you with scraps and have some kind of limit when generating lengthy code. DeepSeek did this in one shot.

Prompt: write a p5.js program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically

Open Reddit thread
r/LocalLLaMA 36 upvotes 42 comments August 22, 2025
DeepSeek V3.1 dynamic Unsloth GGUFs + chat template fixes

Hey r/LocalLLaMA ! It took a bit longer than expected, but we made dynamic imatrix GGUFs for DeepSeek V3.1 at [https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF](https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF) There is also a TQ1\_0 (for naming only) version (**170GB**) which is 1 file for Ollama compatibility and works via `ollama run` [`hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0`](http://hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0)

All dynamic quants use higher bits (6-8bit) for very important layers, and unimportant layers are quantized down. We used over 2-3 million tokens of high quality calibration data for the imatrix phase.

* You must use `--jinja` to enable the correct chat template. You can also use `enable_thinking = True` / `thinking = True`
* You will get the following error when using other quants: `terminate called after throwing an instance of 'std::runtime_error' what(): split method must have between 1 and 1 positional arguments and between 0 and 0 keyword arguments at row 3, column 1908` We fixed it in all our quants!
* The official recommended settings are `--temp 0.6 --top_p 0.95`
* Use `-ot ".ffn_.*_exps.=CPU"` to offload MoE layers to RAM!
* Use KV Cache quantization to enable longer contexts. Try `--cache-type-k q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1` and for V quantization, you have to compile llama.cpp with Flash Attention support.

More docs on how to run it and other stuff at [https://docs.unsloth.ai/basics/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1) I normally recommend using the Q2\_K\_XL or Q3\_K\_XL quants - they work very well!

Open Reddit thread

We evaluated Deepseek v3.1 chat using a minimal agent (no tools other than bash, common-sense prompts, main agent class implemented in some 100 lines of python) and get 53.8% on SWE-bench verified (if you want to reproduce it, you can install [https://github.com/SWE-agent/mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) and it's a one-liner to evaluate on SWE-bench).

https://preview.redd.it/d1dmlmo78gkf1.png?width=780&format=png&auto=webp&s=449eca28d86413e9259d33e66c7df67036c317a5

It currently gets on 2nd place among open source models on our leaderboard (SWE-bench bash-only, where we compare all models with this exact setup, see [https://www.swebench.com/](https://www.swebench.com/) ).

Still working on adding some more models, in particular open source ones. We haven't evaluated DeepSeek v3.1 reasoning so far (it doesn't have tool calls, so it's probably going to be less used for agents).

One of the interesting things is that Deepseek v3.1 chat maxes out later with respect to the number of steps taken by the agent, especially compared to the GPT models. To squeeze out the maximum performance you might have to run for 150 steps.

https://preview.redd.it/ok2y7rta8gkf1.png?width=2157&format=png&auto=webp&s=add6cf27c09da63de3a0169e76a577a038eaa9d2

As a result of the high step numbers, I'd say the effective cost is somewhere near that of GPT-5 mini if you use the official API (the next plot basically shows different cost to performance points depending on how high you set the step limit of the agent — agents succeed fast, but fail very slowly, so you can spend a lot of money without getting a higher resolve rate).

https://preview.redd.it/8dfgx8cc8gkf1.png?width=720&format=png&auto=webp&s=ff3667c6de5ebb0deafc5b4f7c7a031d70af833b

(sorry that the cost/step plots still mostly show proprietary models, we'll have a more complete plot soon).

(note: xpost from https://www.reddit.com/r/DeepSeek/comments/1mwp8ji/evaluating\_deepseek\_v31\_chat\_with\_a\_minimal\_agent/)

Open Reddit thread
View more discussions →
FAQ

Common questions about DeepSeek V3.1

What is the context window for DeepSeek-V3.1?

DeepSeek-V3.1 supports a context window of 128,000 tokens, suitable for long documents, large codebases, and extended multi-turn conversations.

How many parameters does DeepSeek-V3.1 have?

The model has 671 billion total parameters. Due to its Mixture-of-Experts architecture, only 37 billion parameters are activated during any single forward pass.

What is the knowledge cutoff for DeepSeek-V3.1?

Based on the provided metadata, DeepSeek-V3.1's training date is listed as August 2025, which represents the approximate knowledge cutoff for the model.

How does the hybrid thinking mode work?

DeepSeek-V3.1 can operate in a fast non-thinking conversational mode or a slower step-by-step reasoning mode. The mode is selected through prompting rather than by choosing a different model or endpoint.

Is the model available for local deployment?

The model weights for both DeepSeek-V3.1 and DeepSeek-V3.1-Base are available on Hugging Face, making local or self-hosted deployment possible for those with sufficient hardware resources.

More models from DeepSeek

Continue browsing adjacent models from the same provider.

← All AI Models