Long Context Window
Processes up to 256,000 tokens in a single context, enabling analysis of long documents, codebases, or extended conversations.
Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting. Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Mistral Large 3.
Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting.
Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.
Processes up to 256,000 tokens in a single context, enabling analysis of long documents, codebases, or extended conversations.
Uses a sparse MoE design across 675 billion total parameters, activating only a subset of experts per token during inference.
Handles conversations and instructions in a wide range of languages, with Mistral specifically highlighting performance on non-English and non-Chinese languages.
Accepts image inputs alongside text, enabling tasks such as visual question answering and image-based reasoning.
Model weights are publicly available on Hugging Face under a permissive license, supporting local deployment and fine-tuning.
Post-training aligns the model to follow general-purpose instructions, with Mistral reporting parity with leading instruction-tuned open-weight models on general prompts.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Mistral Large 3 discussions are most active in r/LocalLLaMA, r/MistralAI, r/singularity.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 875 upvotes and 114 comments.
It's very dull with a prompt (may be my prompt specifically, I can't say). But no prompt, it acts kinda like Kimi, but a bit more grounded. Lots of highly specific details, sometimes hallucinated, but you can cut them out. Lowish amounts of dialogue. One thing I like is it seems to make less moralizing/annoying characters.
I wouldn't use it by itself, but it seems nice to swap in occasionally to freshen up the prose.
Samples cause nobody here provides samples for some reason:
Sonnet
https://preview.redd.it/qn51abm4ou4g1.png?width=1450&format=png&auto=webp&s=fa61acfebb00c19ae892fe23e69a12cfb937dc73
Mistral
https://preview.redd.it/2ti75ch5ou4g1.png?width=1452&format=png&auto=webp&s=524fed0b805f33d6943b229f3c0b159f65fcc7f9
(I have some html stuff, ignore that)
GLM 4.6
https://preview.redd.it/2ctci6j8ou4g1.png?width=1501&format=png&auto=webp&s=ef35294a41260b7ad724393fa00db57d8598ad02
All models are Apache 2.0 and fully usable for research + commercial work.
Quick breakdown:
• Ministral 3 (3B / 8B / 14B) – compact, multimodal, and available in base, instruct, and reasoning variants. Surprisingly strong for their size.
• Mistral Large 3 (675B MoE) – their new flagship. Strong multilingual performance, high efficiency, and one of the most capable open-weight instruct models released so far.
Why it matters:
You now get a full spectrum of open models that cover everything from on-device reasoning to large enterprise-scale intelligence. The release pushes the ecosystem further toward distributed, open AI instead of closed black-box APIs.
Full announcement: https://mistral.ai/news/mistral-3
Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters. All models are released under the Apache 2.0 license. Open-sourcing our models in a variety of compressed formats empowers the developer community and puts AI in people’s hands through distributed intelligence. The Ministral models represent the best performance-to-cost ratio in their category. At the same time, Mistral Large 3 joins the ranks of frontier instruction-fine-tuned open-source models.
Learn more [here](https://mistral.ai/news/mistral-3).
# Ministral 3
A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: **3B, 8B and 14B**. All with vision capabilities - **All Apache 2.0**.
* **Ministral 3 14B**: The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
* **Ministral 3 8B**: A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
* **Ministral 3 3B**: The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
Weights [here](https://huggingface.co/collections/mistralai/ministral-3), with already quantized variants [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).
# Large 3
A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture - with a Base and Instruct variants. **All Apache 2.0**. Mistral Large 3 is deployable on-premises in:
* [FP8](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512) on a single node of B200s or H200s.
* [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.
# Key Features
Mistral Large 3 consists of two main architectural components:
* **A Granular MoE Language Model with 673B params and 39B active**
* **A 2.5B Vision Encoder**
Weights [here](https://huggingface.co/collections/mistralai/mistral-large-3).
There is a growing market for OpenClaw tools, and because OpenClaw is originally from Europe, many service providers are trying to establish themselves here. We are actually quite successful—for unmanaged hosting, [Hetzner.com](http://hetzner.com/) or [Hostinger.com](http://hostinger.com/) VPS are among the best. There is also a large pool of managed hosts that offer faster, one-click setups, such as [PrimeClaws.com](http://primeclaws.com/) . It is very good news that many of them are based in Europe; I hope our industry for such tools will continue to grow.
Anyone got 1.35TB of VRAM I could borrow?
https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-BF16
Mistral Large 3 supports a context window of 256,000 tokens.
Mistral Large 3 has 675 billion total parameters and uses a mixture-of-experts architecture, meaning only a subset of parameters are active for any given token.
Yes. Mistral Large 3 is released as an open-weight model, meaning the weights are publicly available. The model can be downloaded from Hugging Face and run locally or fine-tuned.
Mistral Large 3 supports text input and also includes image understanding capabilities, allowing it to process image inputs alongside text prompts.
According to Mistral, the model was trained from scratch on 3,000 NVIDIA H200 GPUs.
A specific training data cutoff date has not been published in the available metadata for Mistral Large 3.
Continue browsing adjacent models from the same provider.