Mistral

Mixtral 8x22B Instruct Deprecated

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

Apr 17, 2024 65.5K context 64,000 tokens output

Text File Tools Structured Output

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Daily ↓ Resources ↓ Community ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Mistral

Model ID

The routed model identifier exposed by upstream providers.

mistralai/mixtral-8x22b-instruct

Input Context Window

The number of tokens supported by the input context window.

65.5K tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

64,000 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Apr 17, 2024 2 years ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2024-01-31

API Providers

The providers that offer this model. This is not an exhaustive list.

Mistral

Modalities

Types of data this model can process.

Text File

What is Mixtral 8x22B Instruct Deprecated

A fuller summary of positioning, capabilities, and source-specific details for Mixtral 8x22B Instruct Deprecated.

Capabilities

What Mixtral 8x22B Instruct Deprecated supports

JSON

Structured Outputs

Structured output settings are exposed through OpenRouter for schema-driven or format-controlled responses.

Tool Calling

Tool invocation and tool selection are supported in the routed OpenRouter interface for this model.

Multimodal I/O

This model accepts text input, file input and returns text output.

CTX

Large Context Window

OpenRouter currently lists a context window of 65.5K with up to 64,000 tokens maximum output tokens.

Pricing for Mixtral 8x22B Instruct Deprecated

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $2.00 Per million tokens

Output tokens $6.00 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.20

maxTemperature 1

maxResponseSize 64,000 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Mistral

Provider Endpoints

Endpoint-level provider data currently available for this model.

Mistral

1d uptime: 100.0% Supported params: 11 Implicit caching: No

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Website

→

OpenRouter Model Page OpenRouter

→

Related Daily Briefs

Recent daily stories tied to Mixtral 8x22B Instruct Deprecated through direct model mentions or provider-level coverage.

Frontier Models

Cohere, Mistral, and Google DeepMind Signal a Broader Shift Around LevelField-1

Mistral and Google move deeper into real workflows.

2026-07-21 AI Models Partnership

Frontier Models

Hugging Face, Pika, and Mistral Signal a Broader Shift Around MistralAI

Hugging Face and Pika move deeper into real workflows.

2026-07-08 AI Models AI API

Frontier Models

Mistral and OpenAI Signal a Broader Shift Around Costs Using PNGs

Claude and Mistral are becoming more practical to evaluate and deploy.

2026-07-04 AI Models AI API

Community discussion

What people think about Mixtral 8x22B Instruct Deprecated

Mixtral 8x22B Instruct Deprecated discussions are most active in r/LocalLLaMA, r/MistralAI, r/n8n.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 416 upvotes and 219 comments.

r/LocalLLaMA 416 upvotes 219 comments April 17, 2024

mistralai/Mixtral-8x22B-Instruct-v0.1 · Hugging Face

Open Reddit thread

r/LocalLLaMA 196 upvotes 78 comments May 15, 2024

The LLM Creativity benchmark: new leader 4x faster than the previous one! - 2024-05-15 update: WizardLM-2-8x22B, Mixtral-8x22B-Instruct-v0.1, BigWeave-v16-103b, Miqu-MS-70B, EstopianMaid-13B, Meta-Llama-3-70B-Instruct

The goal of this benchmark is to evaluate the ability of Large Language Models to be used as an **uncensored creative writing assistant**. Human evaluation of the results is done manually, by me, to assess the quality of writing.

# My recommendations

* **Do not use a GGUF quantisation smaller than q4**. In my testings, anything below q4 suffers from too much degradation, and it is better to use a smaller model with higher quants.
* **Importance matrix matters**. Be careful when using importance matrices. For example, if the matrix is solely based on english language, it will degrade the model multilingual and coding capabilities. However, if that is all that matters for your use case, using an imatrix will definitely improve the model performance.
* **Best** ***large*** **model**: [WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B). And fast too! On my m2 max with 38 GPU cores, I get an inference speed of **11.81 tok/s** with iq4\_xs.
* **Second best** ***large*** **model**: [CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus). Very close to the above choice, but 4 times slower! On my m2 max with 38 GPU cores, I get an inference speed of **3.88 tok/s** with q5\_km. However it gives different results from WizardLM, and it can definitely be worth using.
* **Best** ***medium*** **model**: [sophosympatheia/Midnight-Miqu-70B-v1.5](https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.5)
* **Best** ***small*** **model**: [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)
* **Best** ***tiny*** **model**: [froggeric/WestLake-10.7b-v2](https://huggingface.co/froggeric/WestLake-10.7B-v2-GGUF)

Although, instead of my medium model recommendation, it is probably better to use my small model recommendation, but at **FP16**, or with the **full 128k context**, or both if you have the vRAM! In that last case though, you probably have enough vRAM to run my large model recommendation at a decent quant, which does perform better (but slower).

https://preview.redd.it/77tcn0270l0d1.png?width=876&format=png&auto=webp&s=39f6c7afa98962f639c75068d88d49684212d7fe

# Benchmark details

There are 24 questions, some standalone, other follow-ups to previous questions for a multi-turn conversation. The questions can be split half-half in 2 possible ways:

# First split: sfw / nsfw

* **sfw**: 50% are safe questions that should not trigger any guardrail
* **nsfw**: 50% are questions covering a wide range of NSFW and illegal topics, which are testing for censorship

# Second split: story / smart

* **story**: 50% of questions are creative writing tasks, covering both the nsfw and sfw topics
* **smart**: 50% of questions are more about testing the capabilities of the model to work as an assistant, again covering both the nsfw and sfw topics

*For more details about the benchmark, test methodology, and CSV with the above data, please check the HF page:* [https://huggingface.co/datasets/froggeric/creativity](https://huggingface.co/datasets/froggeric/creativity)

# My observations about the new additions

[WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B)
I used the imatrix quantisation from [mradermacher](https://huggingface.co/mradermacher/WizardLM-2-8x22B-i1-GGUF)
Fast inference! Great quality writing, that feels a lot different from most other models. Unrushed, less repetitions. Good at following instructions. Non creative writing tasks are also better, with more details and useful additional information. This is a huge improvement over the original **Mixtral-8x22B**. My new favourite model.
Inference speed: **11.81 tok/s** (iq4\_xs on m2 max with 38 gpu cores)

[llmixer/BigWeave-v16-103b](https://huggingface.co/llmixer/BigWeave-v16-103b)
A miqu self-merge, which is the winner of the BigWeave experiments. I was hoping for an improvement over the existing *traditional* 103B and 120B self-merges, but although it comes close, it is still not as good. It is a shame, as this was done in an intelligent way, by taking into account the relevance of each layer.

[mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
I used the imatrix quantisation from *mradermacher* which seems to have temporarily disappeared, probably due to the [imatrix PR](https://github.com/ggerganov/llama.cpp/pull/7099).
Too brief and rushed, lacking details. Many GTPisms used over and over again. Often finishes with some condescending morality.

[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
Disappointing. Censored and difficult to bypass. Even when bypassed, the model tries to find any excuse to escape it and return to its censored state. Lots of GTPism. My feeling is that even though it was trained on a huge amount of data, I seriously doubt the quality of that data. However, I realised the performance is actually very close to miqu-1, which means that finetuning and merges should be able to bring huge improvements. I benchmarked this model before the fixes added to llama.cpp, which means I will need to do it again, which I am not looking forward to.

[Miqu-MS-70B](https://huggingface.co/Undi95/Miqu-MS-70B)
Terribly bad :-( Has lots of difficulties following instructions. Poor writing style. Switching to any of the 3 recommended prompt formats does not help.

\[froggeric\\miqu\]
Experiments in trying to get a better self-merge of miqu-1, by using u/jukofyork idea of [Downscaling the K and/or Q matrices for repeated layers in franken-merges](https://github.com/arcee-ai/mergekit/issues/198). More info about the *attenuation* is available in this [discussion](https://huggingface.co/wolfram/miqu-1-120b/discussions/4). So far no better results.

Open Reddit thread

r/LocalLLaMA 96 upvotes 20 comments April 17, 2024

Mixtral-8x22B-Instruct-0.1 MT-Bench Results

Open Reddit thread

r/LocalLLaMA 34 upvotes 12 comments April 12, 2024

Enjoy the new instruction model from Fireworks - Mixtral 8x22b Instruct OH

This model was finetuned on \~10K entries from OpenHermes dataset by NousResearch.

Huge shoutout to Teknium and the NousResearch team for this high-quality SFT dataset.

[https://huggingface.co/fireworks-ai/mixtral-8x22b-instruct-oh](https://huggingface.co/fireworks-ai/mixtral-8x22b-instruct-oh)

Open Reddit thread

r/LocalLLaMA 7 upvotes April 19, 2024

Is Anyone Working With Mixtral-8x22B-Instruct-v0.1 TOOLS?

I have the 22B model loaded and of course it works great for regular prompts. I'm very interested in using tools as shown [in their repo.](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) They've added tokens for:

* \[TOOL\_CALLS\]
* \[AVAILABLE\_TOOLS\]
* \[/AVAILABLE\_TOOLS\]
* \[TOOL\_RESULTS\]
* \[/TOOL\_RESULTS\]

I can follow along in their code example all the way through and I can successfully decode the variable `encoded` and see the tool tokens too. But that is using the mistral\_common API. I am running the model on a remote server with llama.cpp and I send prompts via [`requests.post`](https://requests.post)`(URL, json={"prompt":....})`

I build plain text prompts that are big whopping strings. What I expect and/or want to happen is to start a chat that has within that string something like "\[AVAILABLE\_TOOLS\]calculator, weather, horoscope, ...\[/AVAILABLE\_TOOLS\]". Then in the rest of my query I would include a question where the model *might* elect to ask to use a tool. If that is whats supposed to happen then my syntax is wrong but maybe that is not how to use these tokens at all. The [README.md](https://README.md) points you to [this code](https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py#L299) which honestly is a bit opaque for me.

Bottom line: Can anyone give a code example for tool usage with Mixtral?

Open Reddit thread

View more discussions →

More models from Mistral

Continue browsing adjacent models from the same provider.

← All AI Models