Long Context Window
Processes up to 256,000 tokens in a single request, enabling analysis of lengthy documents or extended conversations without truncation.
Ministral 3 3B is a 3-billion-parameter language model developed by Mistral AI as part of the Ministral 3 family. It is the smallest model in that family and is released as open-weight, meaning the model weights are publicly available for download and local use. The model supports a 256,000-token context window and includes both language and vision capabilities in a compact form factor. Ministral 3 3B is designed specifically for edge deployment, making it suitable for running on local hardware, embedded systems, and resource-constrained environments. Its small parameter count allows it to operate efficiently across a wide range of hardware configurations without requiring cloud infrastructure. It is well-suited for developers building on-device applications, offline workflows, or latency-sensitive pipelines where a smaller footprint is a requirement.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Ministral 3 3B.
Ministral 3 3B is a 3-billion-parameter language model developed by Mistral AI as part of the Ministral 3 family. It is the smallest model in that family and is released as open-weight, meaning the model weights are publicly available for download and local use. The model supports a 256,000-token context window and includes both language and vision capabilities in a compact form factor.
Ministral 3 3B is designed specifically for edge deployment, making it suitable for running on local hardware, embedded systems, and resource-constrained environments. Its small parameter count allows it to operate efficiently across a wide range of hardware configurations without requiring cloud infrastructure. It is well-suited for developers building on-device applications, offline workflows, or latency-sensitive pipelines where a smaller footprint is a requirement.
Processes up to 256,000 tokens in a single request, enabling analysis of lengthy documents or extended conversations without truncation.
Generates coherent natural language output for tasks such as summarization, question answering, and instruction following.
Supports image input alongside text, allowing the model to interpret and respond to visual content as part of multimodal prompts.
Released with publicly available model weights under an open license, allowing local deployment and fine-tuning without API dependency.
Optimized for running on local and resource-constrained hardware, including consumer devices, without requiring cloud infrastructure.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Ministral 3 3B discussions are most active in r/LocalLLaMA, r/ollama, r/MistralAI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 869 upvotes and 114 comments.
[https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512)
[https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)
[https://huggingface.co/mistralai/Ministral-3-14B-Base-2512](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512)
The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.
[https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)
[https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)
[https://huggingface.co/mistralai/Ministral-3-8B-Base-2512](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)
A balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities.
[https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512)
[https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)
[https://huggingface.co/mistralai/Ministral-3-3B-Base-2512](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512)
The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
https://preview.redd.it/471e4lma6t4g1.png?width=1078&format=png&auto=webp&s=c23d37e6a361041132ccec451c0a03921acc6e13
https://preview.redd.it/c2szd14b6t4g1.png?width=1210&format=png&auto=webp&s=3d97fc5e8626f25f8c13a5b159e6351976f45de5
[https://huggingface.co/unsloth/Ministral-3-14B-Reasoning-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-14B-Reasoning-2512-GGUF)
[https://huggingface.co/unsloth/Ministral-3-14B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-14B-Instruct-2512-GGUF)
[https://huggingface.co/unsloth/Ministral-3-8B-Reasoning-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-8B-Reasoning-2512-GGUF)
[https://huggingface.co/unsloth/Ministral-3-8B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-8B-Instruct-2512-GGUF)
[https://huggingface.co/unsloth/Ministral-3-3B-Reasoning-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Reasoning-2512-GGUF)
[https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF](https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF)
Recently I was experimenting the small models that can do tool calls effectively and can fit in 6GB Vram and I found ministral-3-3b.
Currently using it's instruct version with Q8 and it's accuracy to run tools written in skills md is generous.
I am curious about your use cases of this model
I’ve been seeing a lot of chatter around Ministral 3 3B, so I wanted to test it in a way that actually matters day to day. Can such a small local model do reliable tool calling, and can you extend it beyond local tools to work with remotely hosted MCP servers?
Here’s what I tried:
# Setup
* Ran a quantized 4-bit (Q4\_K\_M) Ministral 3 3B on Ollama
* Connected it to Open WebUI (with Docker)
* Tested tool calling in two stages:
* Local Python tools inside Open WebUI
* **Remote MCP tools** via Composio (so the model can call externally hosted tools through MCP)
The model, despite the super tiny size of just 3B parameters, is said to support tool calling with even support for structured output. So, this was really fun to see the model in action.
Most of the guides show you how to work with just the local tools, which is not ideal when you plan to use the model for bigger, better and managed tools for hundreds of different services.
In this guide, I've covered the model specs and the entire setup, including setting up a Docker container for Ollama and running Ollama WebUI.
And the nice part is that the model setup guide here works for all the other models that support tool calling.
I wrote up the full walkthrough with commands and screenshots:
You can find it here: [MCP tool calling guide with Ministral 3B, Composio, and Ollama](https://composio.dev/blog/tool-calling-with-ministral-3b)
If anyone else has tested tool calling on Ministral 3 3B (or worked with it using vLLM instead of Ollama), I’d love to hear what worked best for you, as I couldn't get vLLM to work due to CUDA errors. :(
I started out with Rei-V3-KTO (a Mistral Nemo finetune), then moved to Rei-24B-KTO (a Mistral Small 3.2 finetune) and both always made me want a Mistral model that could run on my crappy Intel N5000 laptop with 8GB RAM and no dGPU.
...and now we finally can!
It's about as good as it gets for a 3B model. It won't beat Gemma 3 4B in world knowledge, but it's a lot less censored and inference speeds are decent when using llama.cpp or koboldcpp (older cpu nocuda). It's size is also small enough that I can locally finetune it on my desktop, and the vision support is a nice bonus.
I haven't tried the reasoning variant yet, [waiting for better support to be merged first](https://github.com/ggml-org/llama.cpp/pull/17713). Neither did I test toolcalling, but frankly I'm not interested in that.
What are your thoughts so far on the 3B models?
PS: system prompt I tested with:
[You are a Game Master, simulating a world for User. The simulation follows a strict turn-based pattern. User write a reply, you advance the world further. Advance the world by the smallest possible increment. User controls an avatar named {user}. You control the world and NPCs but not User's avatar. Address User as "you". Write a single extremely short and concise paragraph in plaintext and simple english.]
Ministral 3 3B supports a context window of 256,000 tokens, allowing it to process large amounts of text in a single request.
Yes, Ministral 3 3B is released as an open-weight model, meaning the model weights are publicly available for download, local deployment, and fine-tuning.
The model is designed for edge deployment and is intended to run across diverse hardware configurations, including local consumer setups and resource-constrained environments.
Yes, according to Mistral's announcement, Ministral 3 3B includes vision capabilities alongside its language capabilities.
The training date is listed as not available in the current metadata. Refer to Mistral's official documentation for the most up-to-date information on training data cutoff.
Continue browsing adjacent models from the same provider.