Mistral

Mistral Large 3

Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting. Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

Dec 02, 2025 256,000 context 16,000 tokens output
Long Context Window Mixture-of-Experts Architecture Multilingual Text Generation Image Understanding Open-Weight Access Instruction Following

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Mistral

Input Context Window

The number of tokens supported by the input context window.

256,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Yes

Release Date

When the model was first released.

Dec 02, 2025 5 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

Mistral AI Studio, Amazon Bedrock, Azure AI Foundry, Hugging Face, IBM watsonx, Fireworks, Together AI

Modalities

Types of data this model can process.

Text Image

What is Mistral Large 3

A fuller summary of positioning, capabilities, and source-specific details for Mistral Large 3.

Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting.

Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

Capabilities

What Mistral Large 3 supports

CTX

Long Context Window

Processes up to 256,000 tokens in a single context, enabling analysis of long documents, codebases, or extended conversations.

AI

Mixture-of-Experts Architecture

Uses a sparse MoE design across 675 billion total parameters, activating only a subset of experts per token during inference.

AI

Multilingual Text Generation

Handles conversations and instructions in a wide range of languages, with Mistral specifically highlighting performance on non-English and non-Chinese languages.

IMG

Image Understanding

Accepts image inputs alongside text, enabling tasks such as visual question answering and image-based reasoning.

AI

Open-Weight Access

Model weights are publicly available on Hugging Face under a permissive license, supporting local deployment and fine-tuning.

AI

Instruction Following

Post-training aligns the model to follow general-purpose instructions, with Mistral reporting parity with leading instruction-tuned open-weight models on general prompts.

Pricing for Mistral Large 3

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1
maxResponseSize 16,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Mistral AI Studio Amazon Bedrock Azure AI Foundry Hugging Face IBM watsonx Fireworks Together AI

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
68.0%
HLE
Questions that challenge frontier models across many domains
4.1%
LiveCodeBench
Real-world coding tasks from recent competitions
46.5%
MMLU-Pro
Expert knowledge across 14 academic disciplines
80.7%
SciCode
Scientific research coding and numerical methods
36.2%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about Mistral Large 3

Mistral Large 3 discussions are most active in r/LocalLLaMA, r/MistralAI, r/singularity.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 875 upvotes and 114 comments.

r/SillyTavernAI 21 upvotes 3 comments December 2, 2025
Mistral Large 3 is decent.. without a prompt

It's very dull with a prompt (may be my prompt specifically, I can't say). But no prompt, it acts kinda like Kimi, but a bit more grounded. Lots of highly specific details, sometimes hallucinated, but you can cut them out. Lowish amounts of dialogue. One thing I like is it seems to make less moralizing/annoying characters.

I wouldn't use it by itself, but it seems nice to swap in occasionally to freshen up the prose.

Samples cause nobody here provides samples for some reason:

Sonnet

https://preview.redd.it/qn51abm4ou4g1.png?width=1450&format=png&auto=webp&s=fa61acfebb00c19ae892fe23e69a12cfb937dc73

Mistral

https://preview.redd.it/2ti75ch5ou4g1.png?width=1452&format=png&auto=webp&s=524fed0b805f33d6943b229f3c0b159f65fcc7f9

(I have some html stuff, ignore that)

GLM 4.6

https://preview.redd.it/2ctci6j8ou4g1.png?width=1501&format=png&auto=webp&s=ef35294a41260b7ad724393fa00db57d8598ad02

Open Reddit thread

All models are Apache 2.0 and fully usable for research + commercial work.

Quick breakdown:

• Ministral 3 (3B / 8B / 14B) – compact, multimodal, and available in base, instruct, and reasoning variants. Surprisingly strong for their size.

• Mistral Large 3 (675B MoE) – their new flagship. Strong multilingual performance, high efficiency, and one of the most capable open-weight instruct models released so far.

Why it matters:
You now get a full spectrum of open models that cover everything from on-device reasoning to large enterprise-scale intelligence. The release pushes the ecosystem further toward distributed, open AI instead of closed black-box APIs.

Full announcement: https://mistral.ai/news/mistral-3

Open Reddit thread
r/MistralAI 630 upvotes 42 comments December 2, 2025
Introducing Mistral 3

Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters. All models are released under the Apache 2.0 license. Open-sourcing our models in a variety of compressed formats empowers the developer community and puts AI in people’s hands through distributed intelligence. The Ministral models represent the best performance-to-cost ratio in their category. At the same time, Mistral Large 3 joins the ranks of frontier instruction-fine-tuned open-source models.

Learn more [here](https://mistral.ai/news/mistral-3).

# Ministral 3

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: **3B, 8B and 14B**. All with vision capabilities - **All Apache 2.0**.

* **Ministral 3 14B**: The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
* **Ministral 3 8B**: A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
* **Ministral 3 3B**: The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Weights [here](https://huggingface.co/collections/mistralai/ministral-3), with already quantized variants [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).

# Large 3

A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture - with a Base and Instruct variants. **All Apache 2.0**. Mistral Large 3 is deployable on-premises in:

* [FP8](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512) on a single node of B200s or H200s.
* [NVFP4](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4) on a single node of H100s or A100s.

# Key Features

Mistral Large 3 consists of two main architectural components:

* **A Granular MoE Language Model with 673B params and 39B active**
* **A 2.5B Vision Encoder**

Weights [here](https://huggingface.co/collections/mistralai/mistral-large-3).

Open Reddit thread
r/BuyFromEU 489 upvotes 32 comments March 7, 2026
Mistral Large 3 performs better that GPT5.3 for OpenClaw

There is a growing market for OpenClaw tools, and because OpenClaw is originally from Europe, many service providers are trying to establish themselves here. We are actually quite successful—for unmanaged hosting, [Hetzner.com](http://hetzner.com/) or [Hostinger.com](http://hostinger.com/) VPS are among the best. There is also a large pool of managed hosts that offer faster, one-click setups, such as [PrimeClaws.com](http://primeclaws.com/) . It is very good news that many of them are based in Europe; I hope our industry for such tools will continue to grow.

Open Reddit thread
View more discussions →
FAQ

Common questions about Mistral Large 3

What is the context window for Mistral Large 3?

Mistral Large 3 supports a context window of 256,000 tokens.

How many parameters does Mistral Large 3 have?

Mistral Large 3 has 675 billion total parameters and uses a mixture-of-experts architecture, meaning only a subset of parameters are active for any given token.

Is Mistral Large 3 open source?

Yes. Mistral Large 3 is released as an open-weight model, meaning the weights are publicly available. The model can be downloaded from Hugging Face and run locally or fine-tuned.

What input types does Mistral Large 3 support?

Mistral Large 3 supports text input and also includes image understanding capabilities, allowing it to process image inputs alongside text prompts.

What hardware was used to train Mistral Large 3?

According to Mistral, the model was trained from scratch on 3,000 NVIDIA H200 GPUs.

Is there a knowledge cutoff date for Mistral Large 3?

A specific training data cutoff date has not been published in the available metadata for Mistral Large 3.

More models from Mistral

Continue browsing adjacent models from the same provider.

← All AI Models