Long Context Processing
Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.
Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use. This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Qwen3 235B.
Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use.
This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.
Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.
Optimized for direct, helpful responses as the non-thinking instruct variant, without exposing internal chain-of-thought output.
Scores 51.8% on LiveCodeBench v6, covering real-world programming tasks across multiple languages.
Achieves 70.3% on AIME25 and 41.8% on ARC-AGI, handling multi-step mathematical and logical problem solving.
Scores 77.5% on GPQA and 54.3% on SimpleQA, reflecting broad factual knowledge across science and general domains.
Supports agentic workflows and tool-use scenarios, making it suitable for multi-step task execution and API-integrated pipelines.
Generates and understands text across multiple languages, consistent with the broader Qwen3 model family's multilingual training.
Uses a Mixture-of-Experts architecture that activates only 22B of 235B parameters per forward pass, reducing per-token compute.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2024
American math olympiad problems
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MATH-500
Undergraduate and competition-level math problems
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
Qwen3 235B discussions are most active in r/LocalLLaMA, r/Qwen_AI, r/LocalLLM. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.
The strongest match in this snapshot has 1938 upvotes and 430 comments.
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet!
Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving:
✅ Improved performance in logical reasoning, math, science & coding
✅ Better general skills: instruction following, tool use, alignment
✅ 256K native context for deep, long-form understanding
🧠 Built exclusively for thinking mode, with no need to enable it manually. The model now natively supports extended reasoning chains for maximum depth and accuracy.
🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!
🔧 Powered by:
• Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence.
• MInference – Sparse attention that cuts overhead by focusing on key token interactions
💡 These innovations boost both generation quality and inference speed, delivering up to 3× faster performance on near-1M token sequences.
✅ Fully compatible with vLLM and SGLang for efficient deployment.
📄 See the update model cards for how to enable this feature.
https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507
https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507
https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507
https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507
https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507
https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507
https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507
https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Thinking-2507
https://x.com/Alibaba_Qwen/status/1947344511988076547
New Qwen3-235B-A22B with thinking mode only –– no more hybrid reasoning.
Came across this benchmark PR on Aider
I did my own benchmarks with aider and had consistent results
This is just impressive...
PR: [https://github.com/Aider-AI/aider/pull/3908/commits/015384218f9c87d68660079b70c30e0b59ffacf3](https://github.com/Aider-AI/aider/pull/3908/commits/015384218f9c87d68660079b70c30e0b59ffacf3)
Comment: [https://github.com/Aider-AI/aider/pull/3908#issuecomment-2841120815](https://github.com/Aider-AI/aider/pull/3908#issuecomment-2841120815)
Qwen3 235B supports a native context window of 262,144 tokens, which is approximately 200,000 words. Extended context beyond this is possible using advanced attention mechanisms.
Although the model has 235 billion total parameters, only 22 billion are activated at a time during inference due to its Mixture-of-Experts architecture.
This is the instruct (non-thinking) variant, which produces direct responses without exposing internal chain-of-thought reasoning. The Thinking variant (Qwen3-235B-A22B-Thinking-2507) is a separate model that outputs its reasoning process before answering.
Based on the metadata, the training date is listed as July 2025, which corresponds to the 2507 version suffix in the model name.
Qwen3 235B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution subject to the license terms.
The model is designed for instruction following, agentic workflows, tool use, complex question answering, coding, multilingual tasks, and creative writing. It is not the recommended choice when visible chain-of-thought reasoning is required, as that is handled by the separate Thinking variant.
Continue browsing adjacent models from the same provider.