DeepSeek

DeepSeek V3.2

DeepSeek-V3.2 is an open-weight large language model developed by DeepSeek and released on December 1, 2025. It uses a Mixture-of-Experts architecture combined with a novel sparse attention mechanism called DeepSeek Sparse Attention (DSA), which reduces computational complexity to near-linear scale (O(kL)) for long-context tasks. The model supports a 160,000-token context window and is available under the MIT License on Hugging Face. DeepSeek-V3.2 introduces three notable technical advances: a scalable reinforcement learning training framework, a large-scale agentic task synthesis pipeline covering over 1,800 environments and 85,000+ complex instructions, and native support for Thinking in Tool-Use — the ability to reason while invoking external tools in both thinking and non-thinking modes. It is best suited for complex multi-step reasoning, agentic workflows involving search and code execution, long-context document processing, and developers building AI applications that require integrated reasoning and tool use.

Dec 01, 2025 160,000 context 8,000 tokens output
Long-Context Processing Advanced Reasoning Thinking in Tool Use Agentic Task Execution Code Generation Mathematical Problem Solving

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

DeepSeek

Model ID

The routed model identifier exposed by upstream providers.

deepseek/deepseek-v3.2

Input Context Window

The number of tokens supported by the input context window.

160,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

8,000 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Dec 01, 2025 5 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

December 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Baidu, SiliconFlow, DeepInfra, AtlasCloud, Novita, Parasail, Alibaba, Friendli, Google, SambaNova

Modalities

Types of data this model can process.

Text Code

What is DeepSeek V3.2

A fuller summary of positioning, capabilities, and source-specific details for DeepSeek V3.2.

DeepSeek-V3.2 is an open-weight large language model developed by DeepSeek and released on December 1, 2025. It uses a Mixture-of-Experts architecture combined with a novel sparse attention mechanism called DeepSeek Sparse Attention (DSA), which reduces computational complexity to near-linear scale (O(kL)) for long-context tasks. The model supports a 160,000-token context window and is available under the MIT License on Hugging Face.

DeepSeek-V3.2 introduces three notable technical advances: a scalable reinforcement learning training framework, a large-scale agentic task synthesis pipeline covering over 1,800 environments and 85,000+ complex instructions, and native support for Thinking in Tool-Use — the ability to reason while invoking external tools in both thinking and non-thinking modes. It is best suited for complex multi-step reasoning, agentic workflows involving search and code execution, long-context document processing, and developers building AI applications that require integrated reasoning and tool use.

Capabilities

What DeepSeek V3.2 supports

CTX

Long-Context Processing

Handles inputs up to 160,000 tokens, enabling analysis of lengthy documents, codebases, or multi-turn conversations in a single context window.

RN

Advanced Reasoning

Trained with a scalable reinforcement learning framework that extends post-training compute, supporting multi-step logical and mathematical reasoning tasks.

TL

Thinking in Tool Use

Supports integrated reasoning during tool invocation, allowing the model to think through problems while calling external tools in both thinking and non-thinking modes.

AG

Agentic Task Execution

Trained on a synthesis pipeline covering 1,800+ environments and 85,000+ complex instructions, enabling reliable performance on search, code, and general agent workflows.

</>

Code Generation

Generates, explains, and debugs code across multiple programming languages, with demonstrated performance at competitive programming benchmarks including IOI and ICPC.

AI

Mathematical Problem Solving

Achieves gold-medal-level results on the 2025 IMO, CMO, and ICPC World Finals benchmarks, reflecting strong symbolic and numerical reasoning capabilities.

AI

Sparse Attention Efficiency

Uses DeepSeek Sparse Attention (DSA) to reduce attention computation to near-linear complexity (O(kL)), lowering resource requirements for long-context inference.

AI

Open Weights Access

Released under the MIT License with full model weights available on Hugging Face, allowing local deployment and fine-tuning without usage restrictions.

Pricing for DeepSeek V3.2

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.03
maxTemperature 1
maxResponseSize 8,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Baidu SiliconFlow DeepInfra AtlasCloud Novita Parasail Alibaba Friendli Google SambaNova

Provider Endpoints

Endpoint-level provider data currently available for this model.

Baidu

Max output: 65,536 1d uptime: 99.7% Supported params: 10 Implicit caching: No

SiliconFlow

Max output: 163,840 1d uptime: 99.4% Supported params: 11 Implicit caching: No

DeepInfra

Max output: 16,384 1d uptime: 99.8% Supported params: 17 Implicit caching: No

AtlasCloud

Max output: 163,840 1d uptime: 98.6% Supported params: 17 Implicit caching: No

Novita

Max output: 65,536 1d uptime: 99.9% Supported params: 13 Implicit caching: No

Parasail

Max output: 65,536 1d uptime: 97.7% Supported params: 14 Implicit caching: No

Alibaba

Max prompt: 98,304 Max output: 65,536 1d uptime: 98.5% Supported params: 11 Implicit caching: No

Friendli

Max output: 163,840 1d uptime: 100.0% Supported params: 16 Implicit caching: No

Google

Max output: 65,536 1d uptime: 97.2% Supported params: 15 Implicit caching: No

SambaNova

Max output: 7,168 1d uptime: 100.0% Supported params: 7 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
AIME 2025
American math olympiad problems (2025)
96.0%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
75.1%
HLE
Questions that challenge frontier models across many domains
10.5%
LiveCodeBench
Real-world coding tasks from recent competitions
59.3%
MMLU-Pro
Expert knowledge across 14 academic disciplines
83.7%
SciCode
Scientific research coding and numerical methods
38.7%
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
77.2%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about DeepSeek V3.2

DeepSeek V3.2 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/DeepSeek. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.

The strongest match in this snapshot has 1914 upvotes and 312 comments.

r/SillyTavernAI 13 upvotes 13 comments March 28, 2026
Deepseek V3.2 Open router alternatives

I’ve been using deepseek v3.2 via open router it’s been great my only gripe is it doesn’t want to introduce swears or more mature themes all that well.

I’ve tried various qwen3 models but their outputs result in writing that doesn’t make very much cohesive sense.

I am seeking a deepseek v3.2 alternative for around the same price and outputs just as well

Open Reddit thread
r/DeepSeek 132 upvotes 43 comments April 29, 2026
DeepSeek V3.2 vs DeepSeek V4

DeepSeek V3.2 is still used more than DeepSeek V4

Does anyone know why?

It looks like DeepSeek V4 more expensive, but DeepSeek V3.2 better than DeepSeek V4

Open Reddit thread
r/JanitorAI_Refuges 11 upvotes 13 comments March 21, 2026
Deepseek V3.2 (or) Deepseek V3.0324

I just switched over to V3.2 from V3.0324, and I like both models a lot, but I'm wondering if V3.2 struggles with some things compared to the latter, because I've had a bit of trouble.

(I use my models through OR, on chub) and since using V3.2 I've noticed that by default, it's answers are very short. Now, I know this can be fixed with prompting. The model seems VERY sensitive however because It will go from short, to overly long paragraphs whenever I edit the prompt, by this I mean I could say "2-3 paragraphs, 120-130 words per message" And it's still relatively short, and then I change it to: "125-130 words" And suddenly it generates extremely long replies. I don't know why it can't find an inbetween, maybe I need to tweak my prompt again.

Also, I have to put that in Assistant Prefill to even get it to listen, because sometimes it likes to ignore what I have in post/pre history so I literally have to force it. Additionally, I've been having some error replies, or it won't respond the first time and I have to resend my message. I don't know if maybe chub or OR is just down or having problems, but the message generation also seems a fair bit slower compared to DSV3.0324.

It also doesn't go into detail about a lorebook entry when I activate one, so I wonder if they're compatible, or if they are but it just ignores it. It also likes to end scenes a little too quickly, the two models are definitely both pretty different.

Personally I do prefer V3.2 overall, I just need to figure out how to tweak some of these things out of it so it works a little better.

Open Reddit thread
View more discussions →
FAQ

Common questions about DeepSeek V3.2

What is the context window size for DeepSeek-V3.2?

DeepSeek-V3.2 supports a context window of 160,000 tokens, making it suitable for long-document processing, extended conversations, and large codebase analysis.

Is DeepSeek-V3.2 open source?

Yes. DeepSeek-V3.2 is released as an open-weight model under the MIT License. The model weights are publicly available on Hugging Face at huggingface.co/deepseek-ai/DeepSeek-V3.2.

What is the training data cutoff for DeepSeek-V3.2?

Based on the metadata provided, DeepSeek-V3.2 has a training date of December 2025. Specific knowledge cutoff details are documented in the official technical report.

What makes DeepSeek-V3.2 different from earlier DeepSeek models?

DeepSeek-V3.2 introduces three new capabilities not present in earlier versions: DeepSeek Sparse Attention (DSA) for near-linear attention complexity, a scalable reinforcement learning post-training framework, and a large-scale agentic task synthesis pipeline covering 1,800+ environments. It is also the first DeepSeek model to support Thinking in Tool-Use.

Can DeepSeek-V3.2 be run locally?

Yes. Because the model weights are openly available under the MIT License on Hugging Face, developers can download and run DeepSeek-V3.2 locally. Community users have demonstrated running it on hardware configurations such as 16x AMD MI50 32GB GPUs using vLLM.

What types of tasks is DeepSeek-V3.2 best suited for?

DeepSeek-V3.2 is designed for complex reasoning tasks, agentic workflows (including search and code agents), long-context retrieval, mathematical problem solving, and applications that require the model to reason while using external tools.

More models from DeepSeek

Continue browsing adjacent models from the same provider.

← All AI Models