DeepSeek

DeepSeek V3.1

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks. What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Aug 21, 2025 128,000 context 8,000 tokens output

Hybrid Thinking Mode Long Context Window Tool Use & Agents Code Generation Mathematical Reasoning Mixture-of-Experts Architecture

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Benchmarks ↓ Compare ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

DeepSeek

Model ID

The routed model identifier exposed by upstream providers.

deepseek/deepseek-chat-v3.1

Input Context Window

The number of tokens supported by the input context window.

128,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

8,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Aug 21, 2025 10 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2025-03-31

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, Novita, SiliconFlow, AtlasCloud, WandB, Google, Mara, SambaNova

Modalities

Types of data this model can process.

Text Code

What is DeepSeek V3.1

A fuller summary of positioning, capabilities, and source-specific details for DeepSeek V3.1.

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks.

What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Capabilities

What DeepSeek V3.1 supports

Hybrid Thinking Mode

Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.

CTX

Long Context Window

Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.

Tool Use & Agents

Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.

</>

Code Generation

Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.

Mathematical Reasoning

Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.

Mixture-of-Experts Architecture

Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.

Pricing for DeepSeek V3.1

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.27 Per million tokens

Output tokens $0.79 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.13

maxTemperature 1

maxResponseSize 8,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra Novita SiliconFlow AtlasCloud WandB Google Mara SambaNova

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 32,768 1d uptime: 98.4% Supported params: 17 Implicit caching: No

Novita

Max output: 32,768 1d uptime: 100.0% Supported params: 15 Implicit caching: No

SiliconFlow

Max output: 163,840 1d uptime: 93.9% Supported params: 11 Implicit caching: No

AtlasCloud

Max output: 65,536 1d uptime: 98.7% Supported params: 15 Implicit caching: No

WandB

Max output: 161,000 1d uptime: 93.7% Supported params: 17 Implicit caching: No

Google

Max output: 32,768 1d uptime: 99.5% Supported params: 17 Implicit caching: No

Mara

Max output: 7,168 1d uptime: 61.1% Supported params: 11 Implicit caching: No

SambaNova

Max output: 7,168 1d uptime: 99.4% Supported params: 9 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	73.5%
HLE Questions that challenge frontier models across many domains	6.3%
LiveCodeBench Real-world coding tasks from recent competitions	57.7%
MMLU-Pro Expert knowledge across 14 academic disciplines	83.3%
SciCode Scientific research coding and numerical methods	36.7%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Model Card (Hugging Face) Documentation

→

Official Chat Interface Playground

→

DeepSeek-V3.1-Base Model Open Source

→

DeepGEMM (FP8 Technical Reference) Open Source

→

DeepSeek API Documentation Documentation

→

DeepSeek GitHub Organization Open Source

→

OpenRouter Model Page OpenRouter

→

Compare DeepSeek V3.1 with related models

Jump straight into the most relevant side-by-side comparison pages for this model.

DeepSeek V3.2 vs DeepSeek V3.1

Compare pricing, benchmarks, strengths, and best use cases.

DeepSeek V4 Flash vs DeepSeek V3.1

Compare pricing, benchmarks, strengths, and best use cases.

DeepSeek V4 Pro vs DeepSeek V3.1

Compare pricing, benchmarks, strengths, and best use cases.

Related Daily Briefs

Recent daily stories tied to DeepSeek V3.1 through direct model mentions or provider-level coverage.

Frontier Models

Samsung Deploys ChatGPT Enterprise as Small Models Outperform Frontier LLMs and MiniMax M3 Challenges DeepSeek

MiniMax and OpenAI are raising the stakes for enterprise adoption.

2026-06-21 AI Models AI API

Community discussion

What people think about DeepSeek V3.1

DeepSeek V3.1 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/DeepSeek.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 641 upvotes and 46 comments.

r/DeepSeek 394 upvotes 64 comments August 19, 2025

DeepSeek v3.1 already does better than ChatGPT-5. Change my mind.

No unnecessary hate but ChatGPTs will oftern provide you with scraps and have some kind of limit when generating lengthy code. DeepSeek did this in one shot.

Prompt: write a p5.js program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically

Open Reddit thread

r/LocalLLaMA 36 upvotes 42 comments August 22, 2025

DeepSeek V3.1 dynamic Unsloth GGUFs + chat template fixes

Hey r/LocalLLaMA ! It took a bit longer than expected, but we made dynamic imatrix GGUFs for DeepSeek V3.1 at [https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF](https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF) There is also a TQ1\_0 (for naming only) version (**170GB**) which is 1 file for Ollama compatibility and works via `ollama run` [`hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0`](http://hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0)

All dynamic quants use higher bits (6-8bit) for very important layers, and unimportant layers are quantized down. We used over 2-3 million tokens of high quality calibration data for the imatrix phase.

* You must use `--jinja` to enable the correct chat template. You can also use `enable_thinking = True` / `thinking = True`
* You will get the following error when using other quants: `terminate called after throwing an instance of 'std::runtime_error' what(): split method must have between 1 and 1 positional arguments and between 0 and 0 keyword arguments at row 3, column 1908` We fixed it in all our quants!
* The official recommended settings are `--temp 0.6 --top_p 0.95`
* Use `-ot ".ffn_.*_exps.=CPU"` to offload MoE layers to RAM!
* Use KV Cache quantization to enable longer contexts. Try `--cache-type-k q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1` and for V quantization, you have to compile llama.cpp with Flash Attention support.

More docs on how to run it and other stuff at [https://docs.unsloth.ai/basics/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1) I normally recommend using the Q2\_K\_XL or Q3\_K\_XL quants - they work very well!

Open Reddit thread

r/singularity 9 upvotes 8 comments October 20, 2025

In the current Alpha Arena AI live trading rankings, DeepSeek V3.1 Chat is #1, outperforming all major closed-source models so far.

Open Reddit thread

r/LocalLLaMA 36 upvotes 7 comments August 21, 2025

Evaluating Deepseek v3.1 chat with a minimal agent on SWE-bench verified: Still slightly behind Qwen 3 coder

We evaluated Deepseek v3.1 chat using a minimal agent (no tools other than bash, common-sense prompts, main agent class implemented in some 100 lines of python) and get 53.8% on SWE-bench verified (if you want to reproduce it, you can install [https://github.com/SWE-agent/mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) and it's a one-liner to evaluate on SWE-bench).

https://preview.redd.it/d1dmlmo78gkf1.png?width=780&format=png&auto=webp&s=449eca28d86413e9259d33e66c7df67036c317a5

It currently gets on 2nd place among open source models on our leaderboard (SWE-bench bash-only, where we compare all models with this exact setup, see [https://www.swebench.com/](https://www.swebench.com/) ).

Still working on adding some more models, in particular open source ones. We haven't evaluated DeepSeek v3.1 reasoning so far (it doesn't have tool calls, so it's probably going to be less used for agents).

One of the interesting things is that Deepseek v3.1 chat maxes out later with respect to the number of steps taken by the agent, especially compared to the GPT models. To squeeze out the maximum performance you might have to run for 150 steps.

https://preview.redd.it/ok2y7rta8gkf1.png?width=2157&format=png&auto=webp&s=add6cf27c09da63de3a0169e76a577a038eaa9d2

As a result of the high step numbers, I'd say the effective cost is somewhere near that of GPT-5 mini if you use the official API (the next plot basically shows different cost to performance points depending on how high you set the step limit of the agent — agents succeed fast, but fail very slowly, so you can spend a lot of money without getting a higher resolve rate).

https://preview.redd.it/8dfgx8cc8gkf1.png?width=720&format=png&auto=webp&s=ff3667c6de5ebb0deafc5b4f7c7a031d70af833b

(sorry that the cost/step plots still mostly show proprietary models, we'll have a more complete plot soon).

(note: xpost from https://www.reddit.com/r/DeepSeek/comments/1mwp8ji/evaluating\_deepseek\_v31\_chat\_with\_a\_minimal\_agent/)

Open Reddit thread

r/openrouter 6 upvotes 10 comments August 21, 2025

openrouter just added deepseek/deepseek-chat-v3.1:thinking. thoughts?

It's like Christmas for me when a new model drops.

Open Reddit thread

View more discussions →

FAQ

Common questions about DeepSeek V3.1

What is the context window for DeepSeek-V3.1?

DeepSeek-V3.1 supports a context window of 128,000 tokens, suitable for long documents, large codebases, and extended multi-turn conversations.

How many parameters does DeepSeek-V3.1 have?

The model has 671 billion total parameters. Due to its Mixture-of-Experts architecture, only 37 billion parameters are activated during any single forward pass.

What is the knowledge cutoff for DeepSeek-V3.1?

Based on the provided metadata, DeepSeek-V3.1's training date is listed as August 2025, which represents the approximate knowledge cutoff for the model.

How does the hybrid thinking mode work?

DeepSeek-V3.1 can operate in a fast non-thinking conversational mode or a slower step-by-step reasoning mode. The mode is selected through prompting rather than by choosing a different model or endpoint.

Is the model available for local deployment?

The model weights for both DeepSeek-V3.1 and DeepSeek-V3.1-Base are available on Hugging Face, making local or self-hosted deployment possible for those with sufficient hardware resources.

More models from DeepSeek

Continue browsing adjacent models from the same provider.

← All AI Models