Qwen

Qwen3 235B

Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use. This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.

Apr 28, 2025 262,144 context 262,144 tokens output

Long Context Processing Instruction Following Code Generation Mathematical Reasoning Knowledge Retrieval Agentic Tool Use

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Benchmarks ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Qwen

Model ID

The routed model identifier exposed by upstream providers.

qwen/qwen3-235b-a22b

Input Context Window

The number of tokens supported by the input context window.

262,144 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

262,144 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Apr 28, 2025 1 year ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

July 2025

API Providers

The providers that offer this model. This is not an exhaustive list.

Alibaba

Modalities

Types of data this model can process.

Text Code

What is Qwen3 235B

A fuller summary of positioning, capabilities, and source-specific details for Qwen3 235B.

Qwen3 235B is an instruction-tuned large language model developed by Alibaba's Qwen team, built on a Mixture-of-Experts (MoE) architecture with 235 billion total parameters. During inference, only 22 billion parameters are activated at a time, which reduces computational cost relative to the model's full parameter count. The model supports a native context window of 262,144 tokens and is released under the Apache 2.0 license, permitting commercial use.

This release, versioned as Qwen3-235B-A22B-Instruct-2507, is the non-thinking instruct variant, meaning it produces direct responses without exposing an internal chain-of-thought. It is designed for instruction following, agentic workflows, tool use, multilingual tasks, complex question answering, and coding. The model scores 51.8% on LiveCodeBench v6, 70.3% on AIME25, and 77.5% on GPQA, reflecting its range across coding, mathematical reasoning, and knowledge-intensive tasks.

Capabilities

What Qwen3 235B supports

CTX

Long Context Processing

Handles up to 262,144 tokens natively in a single context window, with extended context support available via advanced attention mechanisms.

Instruction Following

Optimized for direct, helpful responses as the non-thinking instruct variant, without exposing internal chain-of-thought output.

</>

Code Generation

Scores 51.8% on LiveCodeBench v6, covering real-world programming tasks across multiple languages.

Mathematical Reasoning

Achieves 70.3% on AIME25 and 41.8% on ARC-AGI, handling multi-step mathematical and logical problem solving.

Knowledge Retrieval

Scores 77.5% on GPQA and 54.3% on SimpleQA, reflecting broad factual knowledge across science and general domains.

Agentic Tool Use

Supports agentic workflows and tool-use scenarios, making it suitable for multi-step task execution and API-integrated pipelines.

Multilingual Text Generation

Generates and understands text across multiple languages, consistent with the broader Qwen3 model family's multilingual training.

MoE Efficient Inference

Uses a Mixture-of-Experts architecture that activates only 22B of 235B parameters per forward pass, reducing per-token compute.

Pricing for Qwen3 235B

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.15 Per million tokens

Output tokens $1.82 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

maxTemperature 1

maxResponseSize 262,144 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Alibaba

Provider Endpoints

Endpoint-level provider data currently available for this model.

Alibaba

Max prompt: 98,304 Max output: 8,192 1d uptime: 100.0% Supported params: 13 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2024 American math olympiad problems	32.7%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	61.3%
HLE Questions that challenge frontier models across many domains	4.7%
LiveCodeBench Real-world coding tasks from recent competitions	34.3%
MATH-500 Undergraduate and competition-level math problems	90.2%
MMLU-Pro Expert knowledge across 14 academic disciplines	76.2%
SciCode Scientific research coding and numerical methods	29.9%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Model Card Documentation

→

Announcement Blog Post Announcements

→

GitHub Repository Open Source

→

Official Documentation Documentation

→

Unsloth Model Card Other

→

Qwen3 Technical Report Research

→

OpenRouter Model Page OpenRouter

→

Related Daily Briefs

Recent daily stories tied to Qwen3 235B through direct model mentions or provider-level coverage.

Frontier Models

Anthropic, Alibaba, and OpenAI Signal a Broader Shift Around Economic Index

Anthropic and Qwen move deeper into real workflows.

2026-07-22 AI Models AI API

Frontier Models

Google DeepMind, Alibaba, and Hugging Face Signal a Broader Shift Around Run AI

Google and Qwen move deeper into real workflows.

2026-07-21 AI API Integration

Frontier Models

Hugging Face update lands; ChatGPT Increases Custom update lands; SenseNova-U1-Infographic-V3 Launches

Hugging Face and Qwen move deeper into real workflows.

2026-07-16 AI Models AI API

Community discussion

What people think about Qwen3 235B

Qwen3 235B discussions are most active in r/LocalLLaMA, r/Qwen_AI, r/LocalLLM. Top Reddit threads cluster around benchmark and model-comparison threads, coding workflow discussions.

The strongest match in this snapshot has 1938 upvotes and 430 comments.

r/LocalLLaMA 870 upvotes 250 comments July 21, 2025

Qwen3-235B-A22B-2507 Released!

Open Reddit thread

r/LocalLLaMA 858 upvotes 173 comments July 25, 2025

Qwen3-235B-A22B-Thinking-2507 released!

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet!

Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving:
✅ Improved performance in logical reasoning, math, science & coding
✅ Better general skills: instruction following, tool use, alignment
✅ 256K native context for deep, long-form understanding

🧠 Built exclusively for thinking mode, with no need to enable it manually. The model now natively supports extended reasoning chains for maximum depth and accuracy.

Open Reddit thread

r/LocalLLaMA 932 upvotes 72 comments August 8, 2025

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context-up to 1 million tokens!

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!

🔧 Powered by:

• Dual Chunk Attention (DCA) – A length extrapolation method that splits long sequences into manageable chunks while preserving global coherence.

• MInference – Sparse attention that cuts overhead by focusing on key token interactions

💡 These innovations boost both generation quality and inference speed, delivering up to 3× faster performance on near-1M token sequences.

✅ Fully compatible with vLLM and SGLang for efficient deployment.

📄 See the update model cards for how to enable this feature.

https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507

https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507

https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507

https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507

https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507

https://modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507

https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507

https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Thinking-2507

Open Reddit thread

r/LocalLLaMA 530 upvotes 94 comments July 21, 2025

Qwen3-235B-A22B-2507

https://x.com/Alibaba_Qwen/status/1947344511988076547

New Qwen3-235B-A22B with thinking mode only –– no more hybrid reasoning.

Open Reddit thread

r/LocalLLaMA 430 upvotes 115 comments May 3, 2025

Qwen3-235B-A22B (no thinking) Seemingly Outperforms Claude 3.7 with 32k Thinking Tokens in Coding (Aider)

Came across this benchmark PR on Aider
I did my own benchmarks with aider and had consistent results
This is just impressive...

PR: [https://github.com/Aider-AI/aider/pull/3908/commits/015384218f9c87d68660079b70c30e0b59ffacf3](https://github.com/Aider-AI/aider/pull/3908/commits/015384218f9c87d68660079b70c30e0b59ffacf3)
Comment: [https://github.com/Aider-AI/aider/pull/3908#issuecomment-2841120815](https://github.com/Aider-AI/aider/pull/3908#issuecomment-2841120815)

Open Reddit thread

View more discussions →

FAQ

Common questions about Qwen3 235B

What is the context window for Qwen3 235B?

Qwen3 235B supports a native context window of 262,144 tokens, which is approximately 200,000 words. Extended context beyond this is possible using advanced attention mechanisms.

How many parameters are actually used during inference?

Although the model has 235 billion total parameters, only 22 billion are activated at a time during inference due to its Mixture-of-Experts architecture.

What is the difference between this model and the Thinking variant?

This is the instruct (non-thinking) variant, which produces direct responses without exposing internal chain-of-thought reasoning. The Thinking variant (Qwen3-235B-A22B-Thinking-2507) is a separate model that outputs its reasoning process before answering.

What is the training data cutoff for this model?

Based on the metadata, the training date is listed as July 2025, which corresponds to the 2507 version suffix in the model name.

What license does Qwen3 235B use?

Qwen3 235B is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution subject to the license terms.

What tasks is this model best suited for?

The model is designed for instruction following, agentic workflows, tool use, complex question answering, coding, multilingual tasks, and creative writing. It is not the recommended choice when visible chain-of-thought reasoning is required, as that is handled by the separate Thinking variant.

More models from Qwen

Continue browsing adjacent models from the same provider.

← All AI Models