Structured Outputs
Structured output settings are exposed through OpenRouter for schema-driven or format-controlled responses.
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Mixtral 8x22B Instruct Deprecated.
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...
Structured output settings are exposed through OpenRouter for schema-driven or format-controlled responses.
Tool invocation and tool selection are supported in the routed OpenRouter interface for this model.
This model accepts text input, file input and returns text output.
OpenRouter currently lists a context window of 65.5K with up to 64,000 tokens maximum output tokens.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Mixtral 8x22B Instruct Deprecated discussions are most active in r/LocalLLaMA, r/MistralAI, r/n8n.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 416 upvotes and 219 comments.
The goal of this benchmark is to evaluate the ability of Large Language Models to be used as an **uncensored creative writing assistant**. Human evaluation of the results is done manually, by me, to assess the quality of writing.
# My recommendations
* **Do not use a GGUF quantisation smaller than q4**. In my testings, anything below q4 suffers from too much degradation, and it is better to use a smaller model with higher quants.
* **Importance matrix matters**. Be careful when using importance matrices. For example, if the matrix is solely based on english language, it will degrade the model multilingual and coding capabilities. However, if that is all that matters for your use case, using an imatrix will definitely improve the model performance.
* **Best** ***large*** **model**: [WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B). And fast too! On my m2 max with 38 GPU cores, I get an inference speed of **11.81 tok/s** with iq4\_xs.
* **Second best** ***large*** **model**: [CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus). Very close to the above choice, but 4 times slower! On my m2 max with 38 GPU cores, I get an inference speed of **3.88 tok/s** with q5\_km. However it gives different results from WizardLM, and it can definitely be worth using.
* **Best** ***medium*** **model**: [sophosympatheia/Midnight-Miqu-70B-v1.5](https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.5)
* **Best** ***small*** **model**: [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)
* **Best** ***tiny*** **model**: [froggeric/WestLake-10.7b-v2](https://huggingface.co/froggeric/WestLake-10.7B-v2-GGUF)
Although, instead of my medium model recommendation, it is probably better to use my small model recommendation, but at **FP16**, or with the **full 128k context**, or both if you have the vRAM! In that last case though, you probably have enough vRAM to run my large model recommendation at a decent quant, which does perform better (but slower).
https://preview.redd.it/77tcn0270l0d1.png?width=876&format=png&auto=webp&s=39f6c7afa98962f639c75068d88d49684212d7fe
# Benchmark details
There are 24 questions, some standalone, other follow-ups to previous questions for a multi-turn conversation. The questions can be split half-half in 2 possible ways:
# First split: sfw / nsfw
* **sfw**: 50% are safe questions that should not trigger any guardrail
* **nsfw**: 50% are questions covering a wide range of NSFW and illegal topics, which are testing for censorship
# Second split: story / smart
* **story**: 50% of questions are creative writing tasks, covering both the nsfw and sfw topics
* **smart**: 50% of questions are more about testing the capabilities of the model to work as an assistant, again covering both the nsfw and sfw topics
*For more details about the benchmark, test methodology, and CSV with the above data, please check the HF page:* [https://huggingface.co/datasets/froggeric/creativity](https://huggingface.co/datasets/froggeric/creativity)
# My observations about the new additions
[WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B)
I used the imatrix quantisation from [mradermacher](https://huggingface.co/mradermacher/WizardLM-2-8x22B-i1-GGUF)
Fast inference! Great quality writing, that feels a lot different from most other models. Unrushed, less repetitions. Good at following instructions. Non creative writing tasks are also better, with more details and useful additional information. This is a huge improvement over the original **Mixtral-8x22B**. My new favourite model.
Inference speed: **11.81 tok/s** (iq4\_xs on m2 max with 38 gpu cores)
[llmixer/BigWeave-v16-103b](https://huggingface.co/llmixer/BigWeave-v16-103b)
A miqu self-merge, which is the winner of the BigWeave experiments. I was hoping for an improvement over the existing *traditional* 103B and 120B self-merges, but although it comes close, it is still not as good. It is a shame, as this was done in an intelligent way, by taking into account the relevance of each layer.
[mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
I used the imatrix quantisation from *mradermacher* which seems to have temporarily disappeared, probably due to the [imatrix PR](https://github.com/ggerganov/llama.cpp/pull/7099).
Too brief and rushed, lacking details. Many GTPisms used over and over again. Often finishes with some condescending morality.
[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
Disappointing. Censored and difficult to bypass. Even when bypassed, the model tries to find any excuse to escape it and return to its censored state. Lots of GTPism. My feeling is that even though it was trained on a huge amount of data, I seriously doubt the quality of that data. However, I realised the performance is actually very close to miqu-1, which means that finetuning and merges should be able to bring huge improvements. I benchmarked this model before the fixes added to llama.cpp, which means I will need to do it again, which I am not looking forward to.
[Miqu-MS-70B](https://huggingface.co/Undi95/Miqu-MS-70B)
Terribly bad :-( Has lots of difficulties following instructions. Poor writing style. Switching to any of the 3 recommended prompt formats does not help.
\[froggeric\\miqu\]
Experiments in trying to get a better self-merge of miqu-1, by using u/jukofyork idea of [Downscaling the K and/or Q matrices for repeated layers in franken-merges](https://github.com/arcee-ai/mergekit/issues/198). More info about the *attenuation* is available in this [discussion](https://huggingface.co/wolfram/miqu-1-120b/discussions/4). So far no better results.
This model was finetuned on \~10K entries from OpenHermes dataset by NousResearch.
Huge shoutout to Teknium and the NousResearch team for this high-quality SFT dataset.
[https://huggingface.co/fireworks-ai/mixtral-8x22b-instruct-oh](https://huggingface.co/fireworks-ai/mixtral-8x22b-instruct-oh)
I have the 22B model loaded and of course it works great for regular prompts. I'm very interested in using tools as shown [in their repo.](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) They've added tokens for:
* \[TOOL\_CALLS\]
* \[AVAILABLE\_TOOLS\]
* \[/AVAILABLE\_TOOLS\]
* \[TOOL\_RESULTS\]
* \[/TOOL\_RESULTS\]
I can follow along in their code example all the way through and I can successfully decode the variable `encoded` and see the tool tokens too. But that is using the mistral\_common API. I am running the model on a remote server with llama.cpp and I send prompts via [`requests.post`](https://requests.post)`(URL, json={"prompt":....})`
I build plain text prompts that are big whopping strings. What I expect and/or want to happen is to start a chat that has within that string something like "\[AVAILABLE\_TOOLS\]calculator, weather, horoscope, ...\[/AVAILABLE\_TOOLS\]". Then in the rest of my query I would include a question where the model *might* elect to ask to use a tool. If that is whats supposed to happen then my syntax is wrong but maybe that is not how to use these tokens at all. The [README.md](https://README.md) points you to [this code](https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py#L299) which honestly is a bit opaque for me.
Bottom line: Can anyone give a code example for tool usage with Mixtral?
Continue browsing adjacent models from the same provider.