DeepSeek V4 Pro vs DeepSeek V3.1
Compare DeepSeek V4 Pro and DeepSeek V3.1 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus reasoning-heavy tasks.
Overview Comparison
Structured side-by-side differences for the highest-signal model metadata.
Provider
The entity that currently provides this model.
Model ID
The routed model identifier exposed by upstream providers.
Input Context Window
The number of tokens supported by the input context window.
Maximum Output Tokens
The number of tokens that can be generated by the model in a single request.
Open Source
Whether the model's code is available for public use.
Release Date
When the model was first released.
Knowledge Cut-off Date
When the model's knowledge was last updated.
API Providers
The providers that currently expose the model through an API.
Modalities
Types of data each model can process or return.
Pricing Comparison
Compare current token pricing before you choose the cheaper or more scalable API option.
Capabilities Comparison
See where each model overlaps, where they differ, and which one supports more of the features you care about.
Benchmark Comparison
Shared benchmark rows make it easier to compare performance where both models have published scores.
| Benchmark | DeepSeek V4 Pro | DeepSeek V3.1 |
|---|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
||
|
HLE
Questions that challenge frontier models across many domains
|
||
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
||
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
||
|
SciCode
Scientific research coding and numerical methods
|
What Reddit discussions say about DeepSeek V4 Pro vs DeepSeek V3.1
DeepSeek V4 Pro and DeepSeek V3.1 are both surfacing live Reddit discussions, giving this comparison a community layer beyond specs and benchmarks.
The most visible threads right now are clustered in r/DeepSeek, r/SillyTavernAI, r/LocalLLaMA.
chat.deepseek.com
Hey guy - you can now run DeepSeek-V3.1 locally on 170GB RAM with our Dynamic 1-bit GGUFs.🐋
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
It took a bit longer than expected, but we made dynamic imatrix GGUFs for DeepSeek V3.1 at [https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF](https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF) There is also a TQ1\_0 (for naming only) version (**170GB**) which is 1 file for Ollama compatibility and works via `ollama run` [`hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0`](http://hf.co/unsloth/DeepSeek-V3.1-GGUF:TQ1_0)
All dynamic quants use higher bits (6-8bit) for very important layers, and unimportant layers are quantized down. We used over 2-3 million tokens of high quality calibration data for the imatrix phase.
* You must use `--jinja` to enable the correct chat template. You can also use `enable_thinking = True` / `thinking = True`
* You will get the following error when using other quants: `terminate called after throwing an instance of 'std::runtime_error' what(): split method must have between 1 and 1 positional arguments and between 0 and 0 keyword arguments at row 3, column 1908` We fixed it in all our quants!
* The official recommended settings are `--temp 0.6 --top_p 0.95`
* Use `-ot ".ffn_.*_exps.=CPU"` to offload MoE layers to RAM!
* Use KV Cache quantization to enable longer contexts. Try `--cache-type-k q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1` and for V quantization, you have to compile llama.cpp with Flash Attention support.
More docs on how to run it and other stuff at [https://docs.unsloth.ai/basics/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1) I normally recommend using the Q2\_K\_XL or Q3\_K\_XL quants - they work very well!
1250.08 GB | cogito-2.1:latest
1250.08 GB | cogito-2.1:671b
376.71 GB | deepseek-v3.1:latest
376.71 GB | deepseek-v3.1:671b
376.65 GB | deepseek-r1:671b
376.65 GB | deepseek-v3:latest
376.65 GB | deepseek-v3:671b
376.65 GB | r1-1776:671b
270.14 GB | qwen3-coder:480b
226.38 GB | llama3.1:405b
213.14 GB | hermes3:405b
133.43 GB | qwen3-vl:235b
132.39 GB | qwen3:235b
123.78 GB | deepseek-coder-v2:236b
123.78 GB | deepseek-v2:236b
123.78 GB | deepseek-v2.5:latest
123.78 GB | deepseek-v2.5:236b
94.51 GB | falcon:180b
74.05 GB | zephyr:141b
69.75 GB | devstral-2:latest
69.75 GB | devstral-2:123b
69.1 GB | dbrx:latest
69.1 GB | dbrx:132b
68.19 GB | mistral-large:latest
68.19 GB | mistral-large:123b
63.1 GB | megadolphin:latest
63.1 GB | megadolphin:120b
62.81 GB | llama4:latest
62.52 GB | command-a:latest
62.52 GB | command-a:111b
60.88 GB | gpt-oss:120b
60.88 GB | gpt-oss-safeguard:120b
58.57 GB | qwen:110b
55.15 GB | command-r-plus:latest
55.15 GB | command-r-plus:104b
50.87 GB | llama3.2-vision:90b
46.89 GB | qwen3-next:latest
46.89 GB | qwen3-next:80b
45.36 GB | qwen2.5vl:72b
44.16 GB | athene-v2:latest
44.16 GB | athene-v2:72b
44.16 GB | qwen2.5:72b
39.6 GB | cogito:70b
39.6 GB | deepseek-r1:70b
39.6 GB | llama3.1:70b
39.6 GB | llama3.3:latest
39.6 GB | llama3.3:70b
39.6 GB | nemotron:latest
39.6 GB | nemotron:70b
39.6 GB | r1-1776:latest
39.6 GB | r1-1776:70b
39.6 GB | tulu3:70b
38.4 GB | qwen2:72b
38.4 GB | qwen2-math:72b
38.18 GB | qwen:72b
37.22 GB | dolphin-llama3:70b
37.22 GB | firefunction-v2:latest
37.22 GB | firefunction-v2:70b
37.22 GB | hermes3:70b
37.22 GB | llama3:70b
37.22 GB | llama3-chatqa:70b
37.22 GB | llama3-gradient:70b
37.22 GB | llama3-groq-tool-use:70b
37.22 GB | reflection:latest
37.22 GB | reflection:70b
36.2 GB | codellama:70b
36.2 GB | llama2:70b
36.2 GB | llama2-uncensored:70b
36.2 GB | meditron:70b
36.2 GB | orca-mini:70b
36.2 GB | stable-beluga:70b
36.2 GB | wizard-math:70b
35.53 GB | deepseek-llm:67b
24.63 GB | dolphin-mixtral:latest
24.63 GB | mixtral:latest
24.63 GB | notux:latest
24.63 GB | nous-hermes2-mixtral:latest
22.6 GB | nemotron-3-nano:latest
22.6 GB | nemotron-3-nano:30b
22.17 GB | alfred:latest
22.17 GB | alfred:40b
22.17 GB | falcon:40b
19.71 GB | qwen2.5vl:32b
19.47 GB | qwen3-vl:32b
18.84 GB | aya:35b
18.81 GB | qwen3:32b
18.78 GB | llava:34b
18.49 GB | cogito:32b
18.49 GB | deepseek-r1:32b
18.49 GB | openthinker:32b
18.49 GB | qwen2.5:32b
18.49 GB | qwen2.5-coder:32b
18.49 GB | qwq:latest
18.49 GB | qwq:32b
18.44 GB | aya-expanse:32b
18.25 GB | qwen3-vl:30b
18.14 GB | olmo-3:32b
18.14 GB | olmo-3.1:latest
18.14 GB | olmo-3.1:32b
18.13 GB | nous-hermes2:34b
18.13 GB | yi:34b
18.02 GB | exaone-deep:32b
18.02 GB | exaone3.5:32b
17.92 GB | granite-code:34b
17.74 GB | codebooga:latest
17.74 GB | codebooga:34b
17.74 GB | codellama:34b
17.74 GB | phind-codellama:latest
17.74 GB | phind-codellama:34b
17.53 GB | deepseek-coder:33b
17.53 GB | wizardcoder:33b
17.43 GB | command-r:latest
17.43 GB | command-r:35b
17.28 GB | qwen3:30b
17.28 GB | qwen3-coder:latest
17.28 GB | qwen3-coder:30b
17.23 GB | qwen:32b
17.1 GB | vicuna:33b
17.1 GB | wizard-vicuna-uncensored:30b
16.2 GB | gemma3:27b
16.17 GB | translategemma:27b
15.5 GB | shieldgemma:27b
14.56 GB | gemma2:27b
14.42 GB | mistral-small3.1:latest
14.42 GB | mistral-small3.1:24b
14.14 GB | devstral-small-2:latest
14.14 GB | devstral-small-2:24b
14.14 GB | mistral-small3.2:latest
14.14 GB | mistral-small3.2:24b
13.35 GB | devstral:latest
13.35 GB | devstral:24b
13.35 GB | magistral:latest
13.35 GB | magistral:24b
13.35 GB | mistral-small:latest
13.35 GB | mistral-small:24b
12.85 GB | gpt-oss:latest
12.85 GB | gpt-oss:20b
12.85 GB | gpt-oss-safeguard:latest
12.85 GB | gpt-oss-safeguard:20b
12.4 GB | solar-pro:latest
12.4 GB | solar-pro:22b
11.71 GB | codestral:latest
11.71 GB | codestral:22b
11.71 GB | mistral-small:22b
10.82 GB | sailor2:20b
10.76 GB | granite-code:20b
10.55 GB | internlm2:20b
10.35 GB | phi4-reasoning:latest
10.35 GB | phi4-reasoning:14b
8.64 GB | qwen3:14b
8.46 GB | ministral-3:14b
8.44 GB | dolphincoder:15b
8.44 GB | starcoder2:15b
8.43 GB | phi4:latest
8.43 GB | phi4:14b
8.37 GB | cogito:14b
8.37 GB | deepcoder:latest
8.37 GB | deepcoder:14b
8.37 GB | deepseek-r1:14b
8.37 GB | qwen2.5:14b
8.37 GB | qwen2.5-coder:14b
8.37 GB | sqlcoder:15b
8.37 GB | starcoder:15b
8.29 GB | deepseek-coder-v2:latest
8.29 GB | deepseek-coder-v2:16b
8.29 GB | deepseek-v2:latest
8.29 GB | deepseek-v2:16b
7.78 GB | olmo2:13b
7.62 GB | qwen:14b
7.59 GB | gemma3:12b
7.55 GB | translategemma:12b
7.46 GB | llava:13b
7.35 GB | phi3:14b
7.28 GB | llama3.2-vision:latest
7.28 GB | llama3.2-vision:11b
7.03 GB | gemma3n:latest
6.86 GB | codellama:13b
6.86 GB | codeup:latest
6.86 GB | codeup:13b
6.86 GB | everythinglm:latest
6.86 GB | everythinglm:13b
6.86 GB | llama2:13b
6.86 GB | llama2-chinese:13b
6.86 GB | nexusraven:latest
6.86 GB | nexusraven:13b
6.86 GB | nous-hermes:13b
6.86 GB | open-orca-platypus2:latest
6.86 GB | open-orca-platypus2:13b
6.86 GB | orca-mini:13b
6.86 GB | orca2:13b
6.86 GB | stable-beluga:13b
6.86 GB | vicuna:13b
6.86 GB | wizard-math:13b
6.86 GB | wizard-vicuna:latest
6.86 GB | wizard-vicuna:13b
6.86 GB | wizard-vicuna-uncensored:13b
6.86 GB | wizardlm-uncensored:latest
6.86 GB | wizardlm-uncensored:13b
6.86 GB | xwinlm:13b
6.86 GB | yarn-llama2:13b
6.59 GB | mistral-nemo:latest
6.59 GB | mistral-nemo:12b
6.49 GB | stablelm2:12b
6.23 GB | deepseek-ocr:latest
6.23 GB | deepseek-ocr:3b
5.94 GB | falcon2:latest
5.94 GB | falcon2:11b
5.86 GB | falcon3:10b
5.72 GB | qwen3-vl:latest
5.72 GB | qwen3-vl:8b
5.66 GB | nous-hermes2:latest
5.66 GB | nous-hermes2:10.7b
5.66 GB | solar:latest
5.66 GB | solar:10.7b
5.61 GB | ministral-3:latest
5.61 GB | ministral-3:8b
5.56 GB | qwen2.5vl:latest
5.56 GB | qwen2.5vl:7b
5.4 GB | granite3-guardian:8b
5.37 GB | shieldgemma:latest
5.37 GB | shieldgemma:9b
5.16 GB | llava-llama3:latest
5.16 GB | llava-llama3:8b
5.1 GB | minicpm-v:latest
5.1 GB | minicpm-v:8b
5.08 GB | codegeex4:latest
5.08 GB | codegeex4:9b
5.08 GB | glm4:latest
5.08 GB | glm4:9b
5.07 GB | gemma2:latest
5.07 GB | gemma2:9b
4.88 GB | sailor2:latest
4.88 GB | sailor2:8b
4.87 GB | deepseek-r1:latest
4.87 GB | deepseek-r1:8b
4.87 GB | qwen3:latest
4.87 GB | qwen3:8b
4.76 GB | rnj-1:latest
4.76 GB | rnj-1:8b
4.71 GB | aya-expanse:latest
4.71 GB | aya-expanse:8b
4.71 GB | command-r7b:latest
4.71 GB | command-r7b:7b
4.71 GB | command-r7b-arabic:latest
4.71 GB | command-r7b-arabic:7b
4.69 GB | yi:9b
4.69 GB | yi-coder:latest
4.69 GB | yi-coder:9b
4.67 GB | codegemma:latest
4.67 GB | codegemma:7b
4.67 GB | gemma:latest
4.67 GB | gemma:7b
4.65 GB | granite3.1-dense:latest
4.65 GB | granite3.1-dense:8b
4.6 GB | granite3-dense:8b
4.6 GB | granite3.2:latest
4.6 GB | granite3.2:8b
4.6 GB | granite3.3:latest
4.6 GB | granite3.3:8b
4.58 GB | cogito:latest
4.58 GB | cogito:8b
4.58 GB | dolphin3:latest
4.58 GB | dolphin3:8b
4.58 GB | llama-guard3:latest
4.58 GB | llama-guard3:8b
4.58 GB | llama3.1:latest
4.58 GB | llama3.1:8b
4.58 GB | tulu3:latest
4.58 GB | tulu3:8b
4.47 GB | aya:latest
4.47 GB | aya:8b
4.44 GB | exaone-deep:latest
4.44 GB | exaone-deep:7.8b
4.44 GB | exaone3.5:latest
4.44 GB | exaone3.5:7.8b
4.41 GB | bakllava:latest
4.41 GB | bakllava:7b
4.41 GB | llama-pro:latest
4.41 GB | llava:latest
4.41 GB | llava:7b
4.41 GB | opencoder:latest
4.41 GB | opencoder:8b
4.39 GB | bespoke-minicheck:latest
4.39 GB | bespoke-minicheck:7b
4.36 GB | deepseek-r1:7b
4.36 GB | marco-o1:latest
4.36 GB | marco-o1:7b
4.36 GB | openthinker:latest
4.36 GB | openthinker:7b
4.36 GB | qwen2.5:latest
4.36 GB | qwen2.5:7b
4.36 GB | qwen2.5-coder:latest
4.36 GB | qwen2.5-coder:7b
4.36 GB | qwen3-embedding:latest
4.36 GB | qwen3-embedding:8b
4.34 GB | dolphin-llama3:latest
4.34 GB | dolphin-llama3:8b
4.34 GB | hermes3:latest
4.34 GB | hermes3:8b
4.34 GB | llama3:latest
4.34 GB | llama3:8b
4.34 GB | llama3-chatqa:latest
4.34 GB | llama3-chatqa:8b
4.34 GB | llama3-gradient:latest
4.34 GB | llama3-gradient:8b
4.34 GB | llama3-groq-tool-use:latest
4.34 GB | llama3-groq-tool-use:8b
4.28 GB | granite-code:8b
4.26 GB | falcon3:latest
4.26 GB | falcon3:7b
4.2 GB | qwen:7b
4.16 GB | olmo-3:latest
4.16 GB | olmo-3:7b
4.16 GB | olmo2:latest
4.16 GB | olmo2:7b
4.15 GB | internlm2:latest
4.15 GB | internlm2:7b
4.13 GB | qwen2:latest
4.13 GB | qwen2:7b
4.13 GB | qwen2-math:latest
4.13 GB | qwen2-math:7b
4.07 GB | mistral:latest
4.07 GB | mistral:7b
4.0 GB | starcoder:7b
3.94 GB | dolphincoder:latest
3.94 GB | dolphincoder:7b
3.92 GB | falcon:latest
3.92 GB | falcon:7b
3.89 GB | codeqwen:latest
3.89 GB | codeqwen:7b
3.83 GB | dolphin-mistral:latest
3.83 GB | dolphin-mistral:7b
3.83 GB | mathstral:latest
3.83 GB | mathstral:7b
3.83 GB | mistral-openorca:latest
3.83 GB | mistral-openorca:7b
3.83 GB | mistrallite:latest
3.83 GB | mistrallite:7b
3.83 GB | neural-chat:latest
3.83 GB | neural-chat:7b
3.83 GB | notus:latest
3.83 GB | notus:7b
3.83 GB | openchat:latest
3.83 GB | openchat:7b
3.83 GB | openhermes:latest
3.83 GB | samantha-mistral:latest
3.83 GB | samantha-mistral:7b
3.83 GB | sqlcoder:latest
3.83 GB | sqlcoder:7b
3.83 GB | starling-lm:latest
3.83 GB | starling-lm:7b
3.83 GB | wizard-math:latest
3.83 GB | wizard-math:7b
3.83 GB | wizardlm2:latest
3.83 GB | wizardlm2:7b
3.83 GB | yarn-mistral:latest
3.83 GB | yarn-mistral:7b
3.83 GB | zephyr:latest
3.83 GB | zephyr:7b
3.77 GB | starcoder2:7b
3.73 GB | deepseek-llm:latest
3.73 GB | deepseek-llm:7b
3.56 GB | codellama:latest
3.56 GB | codellama:7b
3.56 GB | deepseek-coder:6.7b
3.56 GB | duckdb-nsql:latest
3.56 GB | duckdb-nsql:7b
3.56 GB | llama2:latest
3.56 GB | llama2:7b
3.56 GB | llama2-chinese:latest
3.56 GB | llama2-chinese:7b
3.56 GB | llama2-uncensored:latest
3.56 GB | llama2-uncensored:7b
3.56 GB | magicoder:latest
3.56 GB | magicoder:7b
3.56 GB | meditron:latest
3.56 GB | meditron:7b
3.56 GB | medllama2:latest
3.56 GB | medllama2:7b
3.56 GB | nous-hermes:latest
3.56 GB | nous-hermes:7b
3.56 GB | orca-mini:7b
3.56 GB | orca2:latest
3.56 GB | orca2:7b
3.56 GB | stable-beluga:latest
3.56 GB | stable-beluga:7b
3.56 GB | vicuna:latest
3.56 GB | vicuna:7b
3.56 GB | wizard-vicuna-uncensored:latest
3.56 GB | wizard-vicuna-uncensored:7b
3.56 GB | xwinlm:latest
3.56 GB | xwinlm:7b
3.56 GB | yarn-llama2:latest
3.56 GB | yarn-llama2:7b
3.37 GB | smallthinker:latest
3.37 GB | smallthinker:3b
3.32 GB | deepscaler:latest
3.32 GB | deepscaler:1.5b
3.24 GB | yi:latest
3.24 GB | yi:6b
3.11 GB | gemma3:latest
3.11 GB | gemma3:4b
3.07 GB | qwen3-vl:4b
3.07 GB | translategemma:latest
3.07 GB | translategemma:4b
3.04 GB | granite4:1b
2.98 GB | qwen2.5vl:3b
2.94 GB | phi4-mini-reasoning:latest
2.94 GB | phi4-mini-reasoning:3.8b
2.75 GB | ministral-3:3b
2.73 GB | llava-phi3:latest
2.73 GB | llava-phi3:3.8b
2.51 GB | granite3-guardian:latest
2.51 GB | granite3-guardian:2b
2.51 GB | nemotron-mini:latest
2.51 GB | nemotron-mini:4b
2.33 GB | qwen3:4b
2.33 GB | qwen3-embedding:4b
2.32 GB | phi4-mini:latest
2.32 GB | phi4-mini:3.8b
2.27 GB | granite3.2-vision:latest
2.27 GB | granite3.2-vision:2b
2.17 GB | qwen:latest
2.17 GB | qwen:4b
2.09 GB | cogito:3b
2.03 GB | nuextract:latest
2.03 GB | nuextract:3.8b
2.03 GB | phi3:latest
2.03 GB | phi3:3.8b
2.03 GB | phi3.5:latest
2.03 GB | phi3.5:3.8b
1.96 GB | granite4:3b
1.92 GB | granite3-moe:3b
1.9 GB | granite3.1-moe:latest
1.9 GB | granite3.1-moe:3b
1.88 GB | hermes3:3b
1.88 GB | llama3.2:latest
1.88 GB | llama3.2:3b
1.87 GB | falcon3:3b
1.86 GB | granite-code:latest
1.86 GB | granite-code:3b
1.84 GB | orca-mini:latest
1.84 GB | orca-mini:3b
1.8 GB | qwen2.5:3b
1.8 GB | qwen2.5-coder:3b
1.76 GB | qwen3-vl:2b
1.71 GB | starcoder:latest
1.71 GB | starcoder:3b
1.7 GB | smollm2:latest
1.7 GB | smollm2:1.7b
1.66 GB | falcon3:1b
1.62 GB | moondream:latest
1.62 GB | moondream:1.8b
1.59 GB | shieldgemma:2b
1.59 GB | starcoder2:latest
1.59 GB | starcoder2:3b
1.56 GB | gemma:2b
1.53 GB | exaone-deep:2.4b
1.53 GB | exaone3.5:2.4b
1.52 GB | gemma2:2b
1.5 GB | stable-code:latest
1.5 GB | stable-code:3b
1.5 GB | stablelm-zephyr:latest
1.5 GB | stablelm-zephyr:3b
1.49 GB | dolphin-phi:latest
1.49 GB | dolphin-phi:2.7b
1.49 GB | granite3-dense:latest
1.49 GB | granite3-dense:2b
1.49 GB | llama-guard3:1b
1.49 GB | phi:latest
1.49 GB | phi:2.7b
1.46 GB | granite3.1-dense:2b
1.44 GB | codegemma:2b
1.44 GB | granite3.2:2b
1.44 GB | granite3.3:2b
1.32 GB | granite3.1-moe:1b
1.32 GB | opencoder:1.5b
1.27 GB | qwen3:1.7b
1.23 GB | llama3.2:1b
1.08 GB | bge-m3:latest
1.08 GB | snowflake-arctic-embed2:latest
1.04 GB | deepcoder:1.5b
1.04 GB | deepseek-r1:1.5b
1.04 GB | internlm2:1.8b
1.04 GB | qwen:1.8b
0.98 GB | sailor2:1b
0.92 GB | qwen2.5:1.5b
0.92 GB | qwen2.5-coder:1.5b
0.92 GB | smollm:latest
0.92 GB | smollm:1.7b
0.92 GB | stablelm2:latest
0.92 GB | stablelm2:1.6b
0.87 GB | qwen2:1.5b
0.87 GB | qwen2-math:1.5b
0.87 GB | reader-lm:latest
0.87 GB | reader-lm:1.5b
0.81 GB | yi-coder:1.5b
0.77 GB | granite3-moe:latest
0.77 GB | granite3-moe:1b
0.76 GB | gemma3:1b
0.72 GB | deepseek-coder:latest
0.72 GB | deepseek-coder:1.3b
0.68 GB | lfm2.5-thinking:latest
0.68 GB | lfm2.5-thinking:1.2b
0.68 GB | starcoder:1b
0.62 GB | bge-large:latest
0.62 GB | mxbai-embed-large:latest
0.62 GB | snowflake-arctic-embed:latest
0.6 GB | qwen3-embedding:0.6b
0.59 GB | tinydolphin:latest
0.59 GB | tinydolphin:1.1b
0.59 GB | tinyllama:latest
0.59 GB | tinyllama:1.1b
0.58 GB | embeddinggemma:latest
0.52 GB | paraphrase-multilingual:latest
0.49 GB | qwen3:0.6b
0.37 GB | qwen:0.5b
0.37 GB | qwen2.5:0.5b
0.37 GB | qwen2.5-coder:0.5b
0.33 GB | qwen2:0.5b
0.33 GB | reader-lm:0.5b
0.28 GB | functiongemma:latest
0.26 GB | nomic-embed-text:latest
0.06 GB | granite-embedding:latest
0.04 GB | all-minilm:latest
**1. Musk's xAI Finally Open-Sources Grok-2 (905B Parameters, 128k Context)** xAI has officially open-sourced the model weights and architecture for Grok-2, with Grok-3 announced for release in about six months.
* **Architecture:** Grok-2 uses a Mixture-of-Experts (MoE) architecture with a massive 905 billion total parameters, with 136 billion active during inference.
* **Specs:** It supports a 128k context length. The model is over 500GB and requires 8 GPUs (each with >40GB VRAM) for deployment, with SGLang being a recommended inference engine.
* **License:** Commercial use is restricted to companies with less than $1 million in annual revenue.
**2. "Confidence Filtering" Claims to Make Open-Source Models More Accurate Than GPT-5 on Benchmarks** Researchers from Meta AI and UC San Diego have introduced "DeepConf," a method that dynamically filters and weights inference paths by monitoring real-time confidence scores.
* **Results:** DeepConf enabled an open-source model to achieve 99.9% accuracy on the AIME 2025 benchmark while reducing token consumption by 85%, all without needing external tools.
* **Implementation:** The method works out-of-the-box on existing models with no retraining required and can be integrated into vLLM with just \~50 lines of code.
**3. Altman Hands Over ChatGPT's Reins to New App CEO Fidji Simo** OpenAI CEO Sam Altman is stepping back from the day-to-day operations of the company's application business, handing control to CEO Fidji Simo. Altman will now focus on his larger goals of raising trillions for funding and building out supercomputing infrastructure.
* **Simo's Role:** With her experience from Facebook's hyper-growth era and Instacart's IPO, Simo is seen as a "steady hand" to drive commercialization.
* **New Structure:** This creates a dual-track power structure. Simo will lead the monetization of consumer apps like ChatGPT, with potential expansions into products like a browser and affiliate links in search results as early as this fall.
**4. What is DeepSeek's UE8M0 FP8, and Why Did It Boost Chip Stocks?** The release of DeepSeek V3.1 mentioned using a "UE8M0 FP8" parameter precision, which caused Chinese AI chip stocks like Cambricon to surge nearly 14%.
* **The Tech:** UE8M0 FP8 is a micro-scaling block format where all 8 bits are allocated to the exponent, with no sign bit. This dramatically increases bandwidth efficiency and performance.
* **The Impact:** This technology is being co-optimized with next-gen Chinese domestic chips, allowing larger models to run on the same hardware and boosting the cost-effectiveness of the national chip industry.
**5. Meta May Partner with Midjourney to Integrate its Tech into Future AI Models** Meta's Chief AI Scientist, Alexandr Wang, announced a collaboration with Midjourney, licensing their AI image and video generation technology.
* **The Goal:** The partnership aims to integrate Midjourney's powerful tech into Meta's future AI models and products, helping Meta develop competitors to services like OpenAI's Sora.
* **About Midjourney:** Founded in 2022, Midjourney has never taken external funding and has an estimated annual revenue of $200 million. It just released its first AI video model, V1, in June.
**6. Tencent RTC Launches MCP: 'Summon' Real-Time Video & Chat in Your AI Editor, No RTC Expertise Needed**
* Tencent RTC (TRTC) has officially released the **Model Context Protocol (MCP)**, a new protocol designed for AI-native development that allows developers to build complex real-time features directly within AI code editors like Cursor.
* The protocol works by enabling LLMs to deeply understand and call the TRTC SDK, encapsulating complex audio/video technology into simple natural language prompts. Developers can integrate features like live chat and video calls just by prompting.
* MCP aims to free developers from tedious SDK integration, drastically lowering the barrier and time cost for adding real-time interaction to AI apps. It's especially beneficial for startups and indie devs looking to rapidly prototype ideas.
**7. Coinbase CEO Mandates AI Tools for All Employees, Threatens Firing for Non-Compliance** Coinbase CEO Brian Armstrong issued a company-wide mandate requiring all engineers to use company-provided AI tools like GitHub Copilot and Cursor by a set deadline.
* **The Ultimatum:** Armstrong held a meeting with those who hadn't complied and reportedly fired those without a valid reason, stating that using AI is "not optional, it's mandatory."
* **The Reaction:** The news sparked a heated debate in the developer community, with some supporting the move to boost productivity and others worrying that forcing AI tool usage could harm work quality.
**8. OpenAI Partners with Longevity Biotech Firm to Tackle "Cell Regeneration"** OpenAI is collaborating with Retro Biosciences to develop a GPT-4b micro model for designing new proteins. The goal is to make the Nobel-prize-winning "cellular reprogramming" technology 50 times more efficient.
* **The Breakthrough:** The technology can revert normal skin cells back into pluripotent stem cells. The AI-designed proteins (RetroSOX and RetroKLF) achieved hit rates of over 30% and 50%, respectively.
* **The Benefit:** This not only speeds up the process but also significantly reduces DNA damage, paving the way for more effective cell therapies and anti-aging technologies.
**9. How Claude Code is Built: Internal Dogfooding Drives New Features**
Claude Code's product manager, Cat Wu, revealed their iteration process: engineers rapidly build functional prototypes using Claude Code itself. These prototypes are first rolled out internally, and only the ones that receive strong positive feedback are released publicly. This "dogfooding" approach ensures features are genuinely useful before they reach customers.
**10. a16z Report: AI App-Gen Platforms Are a "Positive-Sum Game"** A study by venture capital firm a16z suggests that AI application generation platforms are not in a winner-take-all market. Instead, they are specializing and differentiating, creating a diverse ecosystem similar to the foundation model market. The report identifies three main categories: Prototyping, Personal Software, and Production Apps, each serving different user needs.
**11. Google's AI Energy Report: One Gemini Prompt ≈ One Second of a Microwave** Google released its first detailed AI energy consumption report, revealing that a median Gemini prompt uses 0.24 Wh of electricity—equivalent to running a microwave for one second.
* **Breakdown:** The energy is consumed by TPUs (58%), host CPU/memory (25%), standby equipment (10%), and data center overhead (8%).
* **Efficiency:** Google claims Gemini's energy consumption has dropped 33x in the last year. Each prompt also uses about 0.26 ml of water for cooling. This is one of the most transparent AI energy reports from a major tech company to date.
What are your thoughts on these developments? Anything important I missed?
DeepSeek V4 pro effectively reverse-engineered a recently released 100B LLM architecture entirely on its own and then adapted llama.cpp to run it. (in \~10M token and less then $2 )
No unnecessary hate but ChatGPTs will oftern provide you with scraps and have some kind of limit when generating lengthy code. DeepSeek did this in one shot.
Prompt: write a p5.js program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically
Which model should you choose?
Use the summary below to decide which model better fits your workflow, budget, and feature requirements.
DeepSeek V4 Pro
DeepSeek V4 Pro is a stronger fit for long-context workloads, reasoning-heavy tasks, tool-augmented workflows.
DeepSeek V3.1
DeepSeek V3.1 is a stronger fit for reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.
Choose DeepSeek V4 Pro if you prioritize long-context workloads, reasoning-heavy tasks, tool-augmented workflows. Choose DeepSeek V3.1 if your workflow depends more on reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.
Common questions about DeepSeek V4 Pro vs DeepSeek V3.1
What is the main difference between DeepSeek V4 Pro and DeepSeek V3.1?
DeepSeek V4 Pro leans toward long-context workloads, reasoning-heavy tasks, tool-augmented workflows, while DeepSeek V3.1 is better suited to reasoning-heavy tasks, tool-augmented workflows, cost-efficient scale.
Which model is cheaper: DeepSeek V4 Pro or DeepSeek V3.1?
DeepSeek V3.1 starts lower on input pricing at $0.2700 per 1M input tokens, compared with $1.7400 for DeepSeek V4 Pro.
Which model has the larger context window: DeepSeek V4 Pro or DeepSeek V3.1?
DeepSeek V4 Pro is listed with a context window of 1.0M, while DeepSeek V3.1 is listed with 128,000.
How should I evaluate DeepSeek V4 Pro vs DeepSeek V3.1 for my use case?
This comparison currently includes 5 shared benchmark rows, helping you compare practical performance across overlapping evaluations.