Long Context Window
Processes up to 128,000 tokens in a single request, enabling analysis of long documents, codebases, or extended conversations without truncation.
Mistral Medium 3 is a text generation model released on May 7, 2025 by Mistral, a French AI company. It is designed to balance performance with cost efficiency, priced at $0.40 per million input tokens and $2.00 per million output tokens. The model supports a 128,000-token context window and was trained on data through early 2025. It is available through Mistral La Plateforme and Amazon SageMaker, with additional platform support planned. Mistral Medium 3 is built with enterprise deployment in mind, supporting self-hosted setups with a minimum of four GPUs as well as any cloud environment. It can be customized through continuous pre-training, fine-tuning, and integration with enterprise knowledge bases, making it applicable to domain-specific workflows in sectors such as financial services, energy, and healthcare. The model is noted for its strengths in coding tasks and multimodal understanding, and is suited for use cases including customer service automation, business process personalization, and complex dataset analysis.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Mistral Medium 3.
Mistral Medium 3 is a text generation model released on May 7, 2025 by Mistral, a French AI company. It is designed to balance performance with cost efficiency, priced at $0.40 per million input tokens and $2.00 per million output tokens. The model supports a 128,000-token context window and was trained on data through early 2025. It is available through Mistral La Plateforme and Amazon SageMaker, with additional platform support planned.
Mistral Medium 3 is built with enterprise deployment in mind, supporting self-hosted setups with a minimum of four GPUs as well as any cloud environment. It can be customized through continuous pre-training, fine-tuning, and integration with enterprise knowledge bases, making it applicable to domain-specific workflows in sectors such as financial services, energy, and healthcare. The model is noted for its strengths in coding tasks and multimodal understanding, and is suited for use cases including customer service automation, business process personalization, and complex dataset analysis.
Processes up to 128,000 tokens in a single request, enabling analysis of long documents, codebases, or extended conversations without truncation.
Generates, explains, and debugs code across common programming languages, with coding identified as one of the model's primary strengths.
Handles tasks requiring multimodal comprehension, supporting analysis that goes beyond plain text inputs as noted in the model's official overview.
Supports continuous pre-training and comprehensive fine-tuning, allowing organizations to adapt the model to domain-specific datasets and workflows.
Can be deployed on any cloud environment or self-hosted on a minimum of four GPUs, with integration options for enterprise knowledge bases.
Priced at $0.40 per million input tokens and $2.00 per million output tokens, positioning it as an accessible option for organizations managing AI inference costs.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Jump straight into the most relevant side-by-side comparison pages for this model.
Compare GLM 5.1 and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus tool-augmented workflows.
Compare DeepSeek V4 Pro and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus tool-augmented workflows.
Compare Mistral Medium 3 and Mistral Small 3.1 (25.03) across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for tool-augmented workflows versus cost-efficient scale.
Compare Amazon Nova Lite and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for tool-augmented workflows versus tool-augmented workflows.
Compare Kimi K2.6 and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus tool-augmented workflows.
Compare DeepSeek V4 Flash and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus tool-augmented workflows.
Mistral Medium 3 discussions are most active in r/MistralAI, r/LocalLLaMA, r/singularity.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 906 upvotes and 336 comments.
[https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF)
# Mistral Medium 3.5 128B
Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights. Mistral Medium 3.5 replaces its predecessor Mistral Medium 3.1 and Magistral in Le Chat. It also replaces Devstral 2 in our coding agent Vibe. Concretely, expect better performance for instruct, reasoning and coding tasks in a new unified model in comparison with our previous released models.
Reasoning effort is configurable per request, so the same model can answer a quick chat reply or work through a complex agentic run. We trained the vision encoder from scratch to handle variable image sizes and aspect ratios.
Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
# Key Features
Mistral Medium 3.5 includes the following architectural choices:
* **Dense 128B parameters**.
* **256k context length**.
* **Multimodal input**: Accepts both text and image input, with text output.
* **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
Mistral Medium 3.5 offers the following capabilities:
* **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
* **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
* **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.
* **System Prompt**: Strong adherence and support for system prompts.
* **Agentic**: Best-in-class agentic capabilities with native function calling and JSON output.
* **Large Context Window**: Supports a 256k context window.
We release this model under a [**Modified MIT License**](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/(https://huggingface.co/mistralai/mistralai/Mistral-Medium-3.5-128B/blob/main/LICENSE)): Open-source license for both commercial and non-commercial use with exceptions for companies with large revenue.
# Recommended Settings
* **Reasoning Effort**:
* `'none'` → Do not use reasoning
* `'high'` → Use reasoning (recommended for complex prompts and agentic usage) Use `reasoning_effort="high"` for complex tasks and agentic coding.
* **Temperature**: 0.7 for `reasoning_effort="high"`. Temp between 0.0 and 0.7 for `reasoning_effort="none"` depending on the task. Generally, lower means answer that are more to the point and higher allows the model to be more creative. It is a good practice to try different values in order to improve the model performance to meet your demands.
[https://huggingface.co/mistralai/Mistral-Medium-3.5-128B](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B)
Looks great for the parameter count
Open Weights. modified MIT -> no commercial usage without paying a license
Mistral Medium 3 supports a context window of 128,000 tokens, allowing it to process long documents, extended conversations, or large codebases in a single request.
The model is priced at $0.40 per million input tokens and $2.00 per million output tokens when accessed via API.
According to the available metadata, Mistral Medium 3 was trained on data through early 2025.
Mistral Medium 3 is available through Mistral La Plateforme and Amazon SageMaker. It also supports self-hosted deployment on a minimum of four GPUs and can run on any cloud environment.
Yes. The model supports continuous pre-training, comprehensive fine-tuning, and integration with enterprise knowledge bases, making it adaptable for domain-specific applications in industries such as healthcare, finance, and energy.
Continue browsing adjacent models from the same provider.