DeepSeek vs DeepSeek

DeepSeek V4 Pro vs DeepSeek V3.1

Compare DeepSeek V4 Pro and DeepSeek V3.1 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus reasoning-heavy tasks.

Overview Comparison

Structured side-by-side differences for the highest-signal model metadata.

DeepSeek V4 Pro
DeepSeek V3.1

Provider

The entity that currently provides this model.

DeepSeek V4 Pro DeepSeek
DeepSeek V3.1 DeepSeek

Model ID

The routed model identifier exposed by upstream providers.

DeepSeek V4 Pro deepseek/deepseek-v4-pro
DeepSeek V3.1 deepseek/deepseek-chat-v3.1

Input Context Window

The number of tokens supported by the input context window.

DeepSeek V4 Pro 1.0M tokens
DeepSeek V3.1 128,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

DeepSeek V4 Pro 384,000 tokens tokens
DeepSeek V3.1 8,000 tokens tokens

Open Source

Whether the model's code is available for public use.

DeepSeek V4 Pro Yes
DeepSeek V3.1 No

Release Date

When the model was first released.

DeepSeek V4 Pro Apr 24, 2026
DeepSeek V3.1 Aug 21, 2025

Knowledge Cut-off Date

When the model's knowledge was last updated.

DeepSeek V4 Pro Unknown
DeepSeek V3.1 2025-03-31

API Providers

The providers that currently expose the model through an API.

DeepSeek V4 Pro
OpenRouter
DeepSeek V3.1
OpenRouter

Modalities

Types of data each model can process or return.

DeepSeek V4 Pro
Text
DeepSeek V3.1
Text Code

Pricing Comparison

Compare current token pricing before you choose the cheaper or more scalable API option.

DeepSeek V4 Pro DeepSeek
Input price $1.74 Per 1M tokens
Output price $0.87 Per 1M tokens
DeepSeek V3.1 DeepSeek
Input price $0.27 Per 1M tokens
Output price $0.79 Per 1M tokens

Capabilities Comparison

See where each model overlaps, where they differ, and which one supports more of the features you care about.

Capability
DeepSeek V4 Pro
DeepSeek V3.1
Code Generation Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Hybrid Thinking Mode Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Long Context Window Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Mathematical Reasoning Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Mixture-of-Experts Architecture Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Reasoning
DeepSeek V4 Pro Supported
DeepSeek V3.1 Supported
Structured Output
DeepSeek V4 Pro Supported
DeepSeek V3.1 Supported
Text
DeepSeek V4 Pro Supported
DeepSeek V3.1 Supported
Tool Use & Agents Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.
DeepSeek V4 Pro
DeepSeek V3.1 Supported
Tools
DeepSeek V4 Pro Supported
DeepSeek V3.1 Supported

Benchmark Comparison

Shared benchmark rows make it easier to compare performance where both models have published scores.

Benchmark DeepSeek V4 Pro DeepSeek V3.1
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
DeepSeek V4 Pro N/A
DeepSeek V3.1 73.5%
HLE
Questions that challenge frontier models across many domains
DeepSeek V4 Pro N/A
DeepSeek V3.1 6.3%
LiveCodeBench
Real-world coding tasks from recent competitions
DeepSeek V4 Pro N/A
DeepSeek V3.1 57.7%
MMLU-Pro
Expert knowledge across 14 academic disciplines
DeepSeek V4 Pro N/A
DeepSeek V3.1 83.3%
SciCode
Scientific research coding and numerical methods
DeepSeek V4 Pro N/A
DeepSeek V3.1 36.7%
Community discussion

What Reddit discussions say about DeepSeek V4 Pro vs DeepSeek V3.1

DeepSeek V4 Pro and DeepSeek V3.1 are both surfacing live Reddit discussions, giving this comparison a community layer beyond specs and benchmarks.

The most visible threads right now are clustered in r/DeepSeek, r/SillyTavernAI, r/LocalLLaMA.

DeepSeek V3.1 r/LocalLLM 636 upvotes 70 comments August 22, 2025
You can now run DeepSeek-V3.1 on your local device!

Hey guy - you can now run DeepSeek-V3.1 locally on 170GB RAM with our Dynamic 1-bit GGUFs.🐋
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers. 

It took a bit longer than expected, but we made dynamic imatrix GGUFs for DeepSeek V3.1 at [https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF](https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF) There is also a TQ1\_0 (for naming only) version (**170GB**) which is 1 file for Ollama compatibility and works via `ollama run` [`hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0`](http://hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0)

All dynamic quants use higher bits (6-8bit) for very important layers, and unimportant layers are quantized down. We used over 2-3 million tokens of high quality calibration data for the imatrix phase.

* You must use `--jinja` to enable the correct chat template. You can also use `enable_thinking = True` / `thinking = True`
* You will get the following error when using other quants: `terminate called after throwing an instance of 'std::runtime_error' what(): split method must have between 1 and 1 positional arguments and between 0 and 0 keyword arguments at row 3, column 1908` We fixed it in all our quants!
* The official recommended settings are `--temp 0.6 --top_p 0.95`
* Use `-ot ".ffn_.*_exps.=CPU"` to offload MoE layers to RAM!
* Use KV Cache quantization to enable longer contexts. Try `--cache-type-k q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1` and for V quantization, you have to compile llama.cpp with Flash Attention support.

More docs on how to run it and other stuff at [https://docs.unsloth.ai/basics/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1) I normally recommend using the Q2\_K\_XL or Q3\_K\_XL quants - they work very well!

Open Reddit thread
DeepSeek V3.1 r/ollama 546 upvotes 44 comments January 25, 2026
Ollama Models Ranked by VRAM Requirements

1250.08 GB | cogito-2.1:latest

1250.08 GB | cogito-2.1:671b

376.71 GB | deepseek-v3.1:latest

376.71 GB | deepseek-v3.1:671b

376.65 GB | deepseek-r1:671b

376.65 GB | deepseek-v3:latest

376.65 GB | deepseek-v3:671b

376.65 GB | r1-1776:671b

270.14 GB | qwen3-coder:480b

226.38 GB | llama3.1:405b

213.14 GB | hermes3:405b

133.43 GB | qwen3-vl:235b

132.39 GB | qwen3:235b

123.78 GB | deepseek-coder-v2:236b

123.78 GB | deepseek-v2:236b

123.78 GB | deepseek-v2.5:latest

123.78 GB | deepseek-v2.5:236b

94.51 GB | falcon:180b

74.05 GB | zephyr:141b

69.75 GB | devstral-2:latest

69.75 GB | devstral-2:123b

69.1 GB | dbrx:latest

69.1 GB | dbrx:132b

68.19 GB | mistral-large:latest

68.19 GB | mistral-large:123b

63.1 GB | megadolphin:latest

63.1 GB | megadolphin:120b

62.81 GB | llama4:latest

62.52 GB | command-a:latest

62.52 GB | command-a:111b

60.88 GB | gpt-oss:120b

60.88 GB | gpt-oss-safeguard:120b

58.57 GB | qwen:110b

55.15 GB | command-r-plus:latest

55.15 GB | command-r-plus:104b

50.87 GB | llama3.2-vision:90b

46.89 GB | qwen3-next:latest

46.89 GB | qwen3-next:80b

45.36 GB | qwen2.5vl:72b

44.16 GB | athene-v2:latest

44.16 GB | athene-v2:72b

44.16 GB | qwen2.5:72b

39.6 GB | cogito:70b

39.6 GB | deepseek-r1:70b

39.6 GB | llama3.1:70b

39.6 GB | llama3.3:latest

39.6 GB | llama3.3:70b

39.6 GB | nemotron:latest

39.6 GB | nemotron:70b

39.6 GB | r1-1776:latest

39.6 GB | r1-1776:70b

39.6 GB | tulu3:70b

38.4 GB | qwen2:72b

38.4 GB | qwen2-math:72b

38.18 GB | qwen:72b

37.22 GB | dolphin-llama3:70b

37.22 GB | firefunction-v2:latest

37.22 GB | firefunction-v2:70b

37.22 GB | hermes3:70b

37.22 GB | llama3:70b

37.22 GB | llama3-chatqa:70b

37.22 GB | llama3-gradient:70b

37.22 GB | llama3-groq-tool-use:70b

37.22 GB | reflection:latest

37.22 GB | reflection:70b

36.2 GB | codellama:70b

36.2 GB | llama2:70b

36.2 GB | llama2-uncensored:70b

36.2 GB | meditron:70b

36.2 GB | orca-mini:70b

36.2 GB | stable-beluga:70b

36.2 GB | wizard-math:70b

35.53 GB | deepseek-llm:67b

24.63 GB | dolphin-mixtral:latest

24.63 GB | mixtral:latest

24.63 GB | notux:latest

24.63 GB | nous-hermes2-mixtral:latest

22.6 GB | nemotron-3-nano:latest

22.6 GB | nemotron-3-nano:30b

22.17 GB | alfred:latest

22.17 GB | alfred:40b

22.17 GB | falcon:40b

19.71 GB | qwen2.5vl:32b

19.47 GB | qwen3-vl:32b

18.84 GB | aya:35b

18.81 GB | qwen3:32b

18.78 GB | llava:34b

18.49 GB | cogito:32b

18.49 GB | deepseek-r1:32b

18.49 GB | openthinker:32b

18.49 GB | qwen2.5:32b

18.49 GB | qwen2.5-coder:32b

18.49 GB | qwq:latest

18.49 GB | qwq:32b

18.44 GB | aya-expanse:32b

18.25 GB | qwen3-vl:30b

18.14 GB | olmo-3:32b

18.14 GB | olmo-3.1:latest

18.14 GB | olmo-3.1:32b

18.13 GB | nous-hermes2:34b

18.13 GB | yi:34b

18.02 GB | exaone-deep:32b

18.02 GB | exaone3.5:32b

17.92 GB | granite-code:34b

17.74 GB | codebooga:latest

17.74 GB | codebooga:34b

17.74 GB | codellama:34b

17.74 GB | phind-codellama:latest

17.74 GB | phind-codellama:34b

17.53 GB | deepseek-coder:33b

17.53 GB | wizardcoder:33b

17.43 GB | command-r:latest

17.43 GB | command-r:35b

17.28 GB | qwen3:30b

17.28 GB | qwen3-coder:latest

17.28 GB | qwen3-coder:30b

17.23 GB | qwen:32b

17.1 GB | vicuna:33b

17.1 GB | wizard-vicuna-uncensored:30b

16.2 GB | gemma3:27b

16.17 GB | translategemma:27b

15.5 GB | shieldgemma:27b

14.56 GB | gemma2:27b

14.42 GB | mistral-small3.1:latest

14.42 GB | mistral-small3.1:24b

14.14 GB | devstral-small-2:latest

14.14 GB | devstral-small-2:24b

14.14 GB | mistral-small3.2:latest

14.14 GB | mistral-small3.2:24b

13.35 GB | devstral:latest

13.35 GB | devstral:24b

13.35 GB | magistral:latest

13.35 GB | magistral:24b

13.35 GB | mistral-small:latest

13.35 GB | mistral-small:24b

12.85 GB | gpt-oss:latest

12.85 GB | gpt-oss:20b

12.85 GB | gpt-oss-safeguard:latest

12.85 GB | gpt-oss-safeguard:20b

12.4 GB | solar-pro:latest

12.4 GB | solar-pro:22b

11.71 GB | codestral:latest

11.71 GB | codestral:22b

11.71 GB | mistral-small:22b

10.82 GB | sailor2:20b

10.76 GB | granite-code:20b

10.55 GB | internlm2:20b

10.35 GB | phi4-reasoning:latest

10.35 GB | phi4-reasoning:14b

8.64 GB | qwen3:14b

8.46 GB | ministral-3:14b

8.44 GB | dolphincoder:15b

8.44 GB | starcoder2:15b

8.43 GB | phi4:latest

8.43 GB | phi4:14b

8.37 GB | cogito:14b

8.37 GB | deepcoder:latest

8.37 GB | deepcoder:14b

8.37 GB | deepseek-r1:14b

8.37 GB | qwen2.5:14b

8.37 GB | qwen2.5-coder:14b

8.37 GB | sqlcoder:15b

8.37 GB | starcoder:15b

8.29 GB | deepseek-coder-v2:latest

8.29 GB | deepseek-coder-v2:16b

8.29 GB | deepseek-v2:latest

8.29 GB | deepseek-v2:16b

7.78 GB | olmo2:13b

7.62 GB | qwen:14b

7.59 GB | gemma3:12b

7.55 GB | translategemma:12b

7.46 GB | llava:13b

7.35 GB | phi3:14b

7.28 GB | llama3.2-vision:latest

7.28 GB | llama3.2-vision:11b

7.03 GB | gemma3n:latest

6.86 GB | codellama:13b

6.86 GB | codeup:latest

6.86 GB | codeup:13b

6.86 GB | everythinglm:latest

6.86 GB | everythinglm:13b

6.86 GB | llama2:13b

6.86 GB | llama2-chinese:13b

6.86 GB | nexusraven:latest

6.86 GB | nexusraven:13b

6.86 GB | nous-hermes:13b

6.86 GB | open-orca-platypus2:latest

6.86 GB | open-orca-platypus2:13b

6.86 GB | orca-mini:13b

6.86 GB | orca2:13b

6.86 GB | stable-beluga:13b

6.86 GB | vicuna:13b

6.86 GB | wizard-math:13b

6.86 GB | wizard-vicuna:latest

6.86 GB | wizard-vicuna:13b

6.86 GB | wizard-vicuna-uncensored:13b

6.86 GB | wizardlm-uncensored:latest

6.86 GB | wizardlm-uncensored:13b

6.86 GB | xwinlm:13b

6.86 GB | yarn-llama2:13b

6.59 GB | mistral-nemo:latest

6.59 GB | mistral-nemo:12b

6.49 GB | stablelm2:12b

6.23 GB | deepseek-ocr:latest

6.23 GB | deepseek-ocr:3b

5.94 GB | falcon2:latest

5.94 GB | falcon2:11b

5.86 GB | falcon3:10b

5.72 GB | qwen3-vl:latest

5.72 GB | qwen3-vl:8b

5.66 GB | nous-hermes2:latest

5.66 GB | nous-hermes2:10.7b

5.66 GB | solar:latest

5.66 GB | solar:10.7b

5.61 GB | ministral-3:latest

5.61 GB | ministral-3:8b

5.56 GB | qwen2.5vl:latest

5.56 GB | qwen2.5vl:7b

5.4 GB | granite3-guardian:8b

5.37 GB | shieldgemma:latest

5.37 GB | shieldgemma:9b

5.16 GB | llava-llama3:latest

5.16 GB | llava-llama3:8b

5.1 GB | minicpm-v:latest

5.1 GB | minicpm-v:8b

5.08 GB | codegeex4:latest

5.08 GB | codegeex4:9b

5.08 GB | glm4:latest

5.08 GB | glm4:9b

5.07 GB | gemma2:latest

5.07 GB | gemma2:9b

4.88 GB | sailor2:latest

4.88 GB | sailor2:8b

4.87 GB | deepseek-r1:latest

4.87 GB | deepseek-r1:8b

4.87 GB | qwen3:latest

4.87 GB | qwen3:8b

4.76 GB | rnj-1:latest

4.76 GB | rnj-1:8b

4.71 GB | aya-expanse:latest

4.71 GB | aya-expanse:8b

4.71 GB | command-r7b:latest

4.71 GB | command-r7b:7b

4.71 GB | command-r7b-arabic:latest

4.71 GB | command-r7b-arabic:7b

4.69 GB | yi:9b

4.69 GB | yi-coder:latest

4.69 GB | yi-coder:9b

4.67 GB | codegemma:latest

4.67 GB | codegemma:7b

4.67 GB | gemma:latest

4.67 GB | gemma:7b

4.65 GB | granite3.1-dense:latest

4.65 GB | granite3.1-dense:8b

4.6 GB | granite3-dense:8b

4.6 GB | granite3.2:latest

4.6 GB | granite3.2:8b

4.6 GB | granite3.3:latest

4.6 GB | granite3.3:8b

4.58 GB | cogito:latest

4.58 GB | cogito:8b

4.58 GB | dolphin3:latest

4.58 GB | dolphin3:8b

4.58 GB | llama-guard3:latest

4.58 GB | llama-guard3:8b

4.58 GB | llama3.1:latest

4.58 GB | llama3.1:8b

4.58 GB | tulu3:latest

4.58 GB | tulu3:8b

4.47 GB | aya:latest

4.47 GB | aya:8b

4.44 GB | exaone-deep:latest

4.44 GB | exaone-deep:7.8b

4.44 GB | exaone3.5:latest

4.44 GB | exaone3.5:7.8b

4.41 GB | bakllava:latest

4.41 GB | bakllava:7b

4.41 GB | llama-pro:latest

4.41 GB | llava:latest

4.41 GB | llava:7b

4.41 GB | opencoder:latest

4.41 GB | opencoder:8b

4.39 GB | bespoke-minicheck:latest

4.39 GB | bespoke-minicheck:7b

4.36 GB | deepseek-r1:7b

4.36 GB | marco-o1:latest

4.36 GB | marco-o1:7b

4.36 GB | openthinker:latest

4.36 GB | openthinker:7b

4.36 GB | qwen2.5:latest

4.36 GB | qwen2.5:7b

4.36 GB | qwen2.5-coder:latest

4.36 GB | qwen2.5-coder:7b

4.36 GB | qwen3-embedding:latest

4.36 GB | qwen3-embedding:8b

4.34 GB | dolphin-llama3:latest

4.34 GB | dolphin-llama3:8b

4.34 GB | hermes3:latest

4.34 GB | hermes3:8b

4.34 GB | llama3:latest

4.34 GB | llama3:8b

4.34 GB | llama3-chatqa:latest

4.34 GB | llama3-chatqa:8b

4.34 GB | llama3-gradient:latest

4.34 GB | llama3-gradient:8b

4.34 GB | llama3-groq-tool-use:latest

4.34 GB | llama3-groq-tool-use:8b

4.28 GB | granite-code:8b

4.26 GB | falcon3:latest

4.26 GB | falcon3:7b

4.2 GB | qwen:7b

4.16 GB | olmo-3:latest

4.16 GB | olmo-3:7b

4.16 GB | olmo2:latest

4.16 GB | olmo2:7b

4.15 GB | internlm2:latest

4.15 GB | internlm2:7b

4.13 GB | qwen2:latest

4.13 GB | qwen2:7b

4.13 GB | qwen2-math:latest

4.13 GB | qwen2-math:7b

4.07 GB | mistral:latest

4.07 GB | mistral:7b

4.0 GB | starcoder:7b

3.94 GB | dolphincoder:latest

3.94 GB | dolphincoder:7b

3.92 GB | falcon:latest

3.92 GB | falcon:7b

3.89 GB | codeqwen:latest

3.89 GB | codeqwen:7b

3.83 GB | dolphin-mistral:latest

3.83 GB | dolphin-mistral:7b

3.83 GB | mathstral:latest

3.83 GB | mathstral:7b

3.83 GB | mistral-openorca:latest

3.83 GB | mistral-openorca:7b

3.83 GB | mistrallite:latest

3.83 GB | mistrallite:7b

3.83 GB | neural-chat:latest

3.83 GB | neural-chat:7b

3.83 GB | notus:latest

3.83 GB | notus:7b

3.83 GB | openchat:latest

3.83 GB | openchat:7b

3.83 GB | openhermes:latest

3.83 GB | samantha-mistral:latest

3.83 GB | samantha-mistral:7b

3.83 GB | sqlcoder:latest

3.83 GB | sqlcoder:7b

3.83 GB | starling-lm:latest

3.83 GB | starling-lm:7b

3.83 GB | wizard-math:latest

3.83 GB | wizard-math:7b

3.83 GB | wizardlm2:latest

3.83 GB | wizardlm2:7b

3.83 GB | yarn-mistral:latest

3.83 GB | yarn-mistral:7b

3.83 GB | zephyr:latest

3.83 GB | zephyr:7b

3.77 GB | starcoder2:7b

3.73 GB | deepseek-llm:latest

3.73 GB | deepseek-llm:7b

3.56 GB | codellama:latest

3.56 GB | codellama:7b

3.56 GB | deepseek-coder:6.7b

3.56 GB | duckdb-nsql:latest

3.56 GB | duckdb-nsql:7b

3.56 GB | llama2:latest

3.56 GB | llama2:7b

3.56 GB | llama2-chinese:latest

3.56 GB | llama2-chinese:7b

3.56 GB | llama2-uncensored:latest

3.56 GB | llama2-uncensored:7b

3.56 GB | magicoder:latest

3.56 GB | magicoder:7b

3.56 GB | meditron:latest

3.56 GB | meditron:7b

3.56 GB | medllama2:latest

3.56 GB | medllama2:7b

3.56 GB | nous-hermes:latest

3.56 GB | nous-hermes:7b

3.56 GB | orca-mini:7b

3.56 GB | orca2:latest

3.56 GB | orca2:7b

3.56 GB | stable-beluga:latest

3.56 GB | stable-beluga:7b

3.56 GB | vicuna:latest

3.56 GB | vicuna:7b

3.56 GB | wizard-vicuna-uncensored:latest

3.56 GB | wizard-vicuna-uncensored:7b

3.56 GB | xwinlm:latest

3.56 GB | xwinlm:7b

3.56 GB | yarn-llama2:latest

3.56 GB | yarn-llama2:7b

3.37 GB | smallthinker:latest

3.37 GB | smallthinker:3b

3.32 GB | deepscaler:latest

3.32 GB | deepscaler:1.5b

3.24 GB | yi:latest

3.24 GB | yi:6b

3.11 GB | gemma3:latest

3.11 GB | gemma3:4b

3.07 GB | qwen3-vl:4b

3.07 GB | translategemma:latest

3.07 GB | translategemma:4b

3.04 GB | granite4:1b

2.98 GB | qwen2.5vl:3b

2.94 GB | phi4-mini-reasoning:latest

2.94 GB | phi4-mini-reasoning:3.8b

2.75 GB | ministral-3:3b

2.73 GB | llava-phi3:latest

2.73 GB | llava-phi3:3.8b

2.51 GB | granite3-guardian:latest

2.51 GB | granite3-guardian:2b

2.51 GB | nemotron-mini:latest

2.51 GB | nemotron-mini:4b

2.33 GB | qwen3:4b

2.33 GB | qwen3-embedding:4b

2.32 GB | phi4-mini:latest

2.32 GB | phi4-mini:3.8b

2.27 GB | granite3.2-vision:latest

2.27 GB | granite3.2-vision:2b

2.17 GB | qwen:latest

2.17 GB | qwen:4b

2.09 GB | cogito:3b

2.03 GB | nuextract:latest

2.03 GB | nuextract:3.8b

2.03 GB | phi3:latest

2.03 GB | phi3:3.8b

2.03 GB | phi3.5:latest

2.03 GB | phi3.5:3.8b

1.96 GB | granite4:3b

1.92 GB | granite3-moe:3b

1.9 GB | granite3.1-moe:latest

1.9 GB | granite3.1-moe:3b

1.88 GB | hermes3:3b

1.88 GB | llama3.2:latest

1.88 GB | llama3.2:3b

1.87 GB | falcon3:3b

1.86 GB | granite-code:latest

1.86 GB | granite-code:3b

1.84 GB | orca-mini:latest

1.84 GB | orca-mini:3b

1.8 GB | qwen2.5:3b

1.8 GB | qwen2.5-coder:3b

1.76 GB | qwen3-vl:2b

1.71 GB | starcoder:latest

1.71 GB | starcoder:3b

1.7 GB | smollm2:latest

1.7 GB | smollm2:1.7b

1.66 GB | falcon3:1b

1.62 GB | moondream:latest

1.62 GB | moondream:1.8b

1.59 GB | shieldgemma:2b

1.59 GB | starcoder2:latest

1.59 GB | starcoder2:3b

1.56 GB | gemma:2b

1.53 GB | exaone-deep:2.4b

1.53 GB | exaone3.5:2.4b

1.52 GB | gemma2:2b

1.5 GB | stable-code:latest

1.5 GB | stable-code:3b

1.5 GB | stablelm-zephyr:latest

1.5 GB | stablelm-zephyr:3b

1.49 GB | dolphin-phi:latest

1.49 GB | dolphin-phi:2.7b

1.49 GB | granite3-dense:latest

1.49 GB | granite3-dense:2b

1.49 GB | llama-guard3:1b

1.49 GB | phi:latest

1.49 GB | phi:2.7b

1.46 GB | granite3.1-dense:2b

1.44 GB | codegemma:2b

1.44 GB | granite3.2:2b

1.44 GB | granite3.3:2b

1.32 GB | granite3.1-moe:1b

1.32 GB | opencoder:1.5b

1.27 GB | qwen3:1.7b

1.23 GB | llama3.2:1b

1.08 GB | bge-m3:latest

1.08 GB | snowflake-arctic-embed2:latest

1.04 GB | deepcoder:1.5b

1.04 GB | deepseek-r1:1.5b

1.04 GB | internlm2:1.8b

1.04 GB | qwen:1.8b

0.98 GB | sailor2:1b

0.92 GB | qwen2.5:1.5b

0.92 GB | qwen2.5-coder:1.5b

0.92 GB | smollm:latest

0.92 GB | smollm:1.7b

0.92 GB | stablelm2:latest

0.92 GB | stablelm2:1.6b

0.87 GB | qwen2:1.5b

0.87 GB | qwen2-math:1.5b

0.87 GB | reader-lm:latest

0.87 GB | reader-lm:1.5b

0.81 GB | yi-coder:1.5b

0.77 GB | granite3-moe:latest

0.77 GB | granite3-moe:1b

0.76 GB | gemma3:1b

0.72 GB | deepseek-coder:latest

0.72 GB | deepseek-coder:1.3b

0.68 GB | lfm2.5-thinking:latest

0.68 GB | lfm2.5-thinking:1.2b

0.68 GB | starcoder:1b

0.62 GB | bge-large:latest

0.62 GB | mxbai-embed-large:latest

0.62 GB | snowflake-arctic-embed:latest

0.6 GB | qwen3-embedding:0.6b

0.59 GB | tinydolphin:latest

0.59 GB | tinydolphin:1.1b

0.59 GB | tinyllama:latest

0.59 GB | tinyllama:1.1b

0.58 GB | embeddinggemma:latest

0.52 GB | paraphrase-multilingual:latest

0.49 GB | qwen3:0.6b

0.37 GB | qwen:0.5b

0.37 GB | qwen2.5:0.5b

0.37 GB | qwen2.5-coder:0.5b

0.33 GB | qwen2:0.5b

0.33 GB | reader-lm:0.5b

0.28 GB | functiongemma:latest

0.26 GB | nomic-embed-text:latest

0.06 GB | granite-embedding:latest

0.04 GB | all-minilm:latest

Open Reddit thread
DeepSeek V3.1 r/AI_Agents 507 upvotes 43 comments August 25, 2025
A Massive Wave of AI News Just Dropped (Aug 24). Here's what you don't want to miss:

**1. Musk's xAI Finally Open-Sources Grok-2 (905B Parameters, 128k Context)** xAI has officially open-sourced the model weights and architecture for Grok-2, with Grok-3 announced for release in about six months.

* **Architecture:** Grok-2 uses a Mixture-of-Experts (MoE) architecture with a massive 905 billion total parameters, with 136 billion active during inference.
* **Specs:** It supports a 128k context length. The model is over 500GB and requires 8 GPUs (each with >40GB VRAM) for deployment, with SGLang being a recommended inference engine.
* **License:** Commercial use is restricted to companies with less than $1 million in annual revenue.

**2. "Confidence Filtering" Claims to Make Open-Source Models More Accurate Than GPT-5 on Benchmarks** Researchers from Meta AI and UC San Diego have introduced "DeepConf," a method that dynamically filters and weights inference paths by monitoring real-time confidence scores.

* **Results:** DeepConf enabled an open-source model to achieve 99.9% accuracy on the AIME 2025 benchmark while reducing token consumption by 85%, all without needing external tools.
* **Implementation:** The method works out-of-the-box on existing models with no retraining required and can be integrated into vLLM with just \~50 lines of code.

**3. Altman Hands Over ChatGPT's Reins to New App CEO Fidji Simo** OpenAI CEO Sam Altman is stepping back from the day-to-day operations of the company's application business, handing control to CEO Fidji Simo. Altman will now focus on his larger goals of raising trillions for funding and building out supercomputing infrastructure.

* **Simo's Role:** With her experience from Facebook's hyper-growth era and Instacart's IPO, Simo is seen as a "steady hand" to drive commercialization.
* **New Structure:** This creates a dual-track power structure. Simo will lead the monetization of consumer apps like ChatGPT, with potential expansions into products like a browser and affiliate links in search results as early as this fall.

**4. What is DeepSeek's UE8M0 FP8, and Why Did It Boost Chip Stocks?** The release of DeepSeek V3.1 mentioned using a "UE8M0 FP8" parameter precision, which caused Chinese AI chip stocks like Cambricon to surge nearly 14%.

* **The Tech:** UE8M0 FP8 is a micro-scaling block format where all 8 bits are allocated to the exponent, with no sign bit. This dramatically increases bandwidth efficiency and performance.
* **The Impact:** This technology is being co-optimized with next-gen Chinese domestic chips, allowing larger models to run on the same hardware and boosting the cost-effectiveness of the national chip industry.

**5. Meta May Partner with Midjourney to Integrate its Tech into Future AI Models** Meta's Chief AI Scientist, Alexandr Wang, announced a collaboration with Midjourney, licensing their AI image and video generation technology.

* **The Goal:** The partnership aims to integrate Midjourney's powerful tech into Meta's future AI models and products, helping Meta develop competitors to services like OpenAI's Sora.
* **About Midjourney:** Founded in 2022, Midjourney has never taken external funding and has an estimated annual revenue of $200 million. It just released its first AI video model, V1, in June.

**6. Tencent RTC Launches MCP: 'Summon' Real-Time Video & Chat in Your AI Editor, No RTC Expertise Needed**

* Tencent RTC (TRTC) has officially released the **Model Context Protocol (MCP)**, a new protocol designed for AI-native development that allows developers to build complex real-time features directly within AI code editors like Cursor.
* The protocol works by enabling LLMs to deeply understand and call the TRTC SDK, encapsulating complex audio/video technology into simple natural language prompts. Developers can integrate features like live chat and video calls just by prompting.
* MCP aims to free developers from tedious SDK integration, drastically lowering the barrier and time cost for adding real-time interaction to AI apps. It's especially beneficial for startups and indie devs looking to rapidly prototype ideas.

**7. Coinbase CEO Mandates AI Tools for All Employees, Threatens Firing for Non-Compliance** Coinbase CEO Brian Armstrong issued a company-wide mandate requiring all engineers to use company-provided AI tools like GitHub Copilot and Cursor by a set deadline.

* **The Ultimatum:** Armstrong held a meeting with those who hadn't complied and reportedly fired those without a valid reason, stating that using AI is "not optional, it's mandatory."
* **The Reaction:** The news sparked a heated debate in the developer community, with some supporting the move to boost productivity and others worrying that forcing AI tool usage could harm work quality.

**8. OpenAI Partners with Longevity Biotech Firm to Tackle "Cell Regeneration"** OpenAI is collaborating with Retro Biosciences to develop a GPT-4b micro model for designing new proteins. The goal is to make the Nobel-prize-winning "cellular reprogramming" technology 50 times more efficient.

* **The Breakthrough:** The technology can revert normal skin cells back into pluripotent stem cells. The AI-designed proteins (RetroSOX and RetroKLF) achieved hit rates of over 30% and 50%, respectively.
* **The Benefit:** This not only speeds up the process but also significantly reduces DNA damage, paving the way for more effective cell therapies and anti-aging technologies.

**9. How Claude Code is Built: Internal Dogfooding Drives New Features** 

Claude Code's product manager, Cat Wu, revealed their iteration process: engineers rapidly build functional prototypes using Claude Code itself. These prototypes are first rolled out internally, and only the ones that receive strong positive feedback are released publicly. This "dogfooding" approach ensures features are genuinely useful before they reach customers.

**10. a16z Report: AI App-Gen Platforms Are a "Positive-Sum Game"** A study by venture capital firm a16z suggests that AI application generation platforms are not in a winner-take-all market. Instead, they are specializing and differentiating, creating a diverse ecosystem similar to the foundation model market. The report identifies three main categories: Prototyping, Personal Software, and Production Apps, each serving different user needs.

**11. Google's AI Energy Report: One Gemini Prompt ≈ One Second of a Microwave** Google released its first detailed AI energy consumption report, revealing that a median Gemini prompt uses 0.24 Wh of electricity—equivalent to running a microwave for one second.

* **Breakdown:** The energy is consumed by TPUs (58%), host CPU/memory (25%), standby equipment (10%), and data center overhead (8%).
* **Efficiency:** Google claims Gemini's energy consumption has dropped 33x in the last year. Each prompt also uses about 0.26 ml of water for cooling. This is one of the most transparent AI energy reports from a major tech company to date.

What are your thoughts on these developments? Anything important I missed?

Open Reddit thread
DeepSeek V3.1 r/DeepSeek 394 upvotes 64 comments August 19, 2025
DeepSeek v3.1 already does better than ChatGPT-5. Change my mind.

No unnecessary hate but ChatGPTs will oftern provide you with scraps and have some kind of limit when generating lengthy code. DeepSeek did this in one shot.

Prompt: write a p5.js program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically

Open Reddit thread
View more discussions →

Which model should you choose?

Use the summary below to decide which model better fits your workflow, budget, and feature requirements.

Best fit for

DeepSeek V4 Pro

DeepSeek V4 Pro is a stronger fit for long-context workloads, reasoning-heavy tasks, tool-augmented workflows.

Best fit for

DeepSeek V3.1

DeepSeek V3.1 is a stronger fit for reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.

Verdict

Choose DeepSeek V4 Pro if you prioritize long-context workloads, reasoning-heavy tasks, tool-augmented workflows. Choose DeepSeek V3.1 if your workflow depends more on reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.

FAQ

Common questions about DeepSeek V4 Pro vs DeepSeek V3.1

What is the main difference between DeepSeek V4 Pro and DeepSeek V3.1?

DeepSeek V4 Pro leans toward long-context workloads, reasoning-heavy tasks, tool-augmented workflows, while DeepSeek V3.1 is better suited to reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.

Which model is cheaper: DeepSeek V4 Pro or DeepSeek V3.1?

DeepSeek V3.1 starts lower on input pricing at $0.2700 per 1M input tokens, compared with $1.7400 for DeepSeek V4 Pro.

Which model has the larger context window: DeepSeek V4 Pro or DeepSeek V3.1?

DeepSeek V4 Pro is listed with a context window of 1.0M, while DeepSeek V3.1 is listed with 128,000.

How should I evaluate DeepSeek V4 Pro vs DeepSeek V3.1 for my use case?

This comparison currently includes 5 shared benchmark rows, helping you compare practical performance across overlapping evaluations.