Extended Context Window
Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.
GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series. GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for GLM 4.6.
GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series.
GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.
Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.
Supports tool calling during the reasoning process itself, allowing the model to query APIs or search for information while thinking through a problem rather than only after.
Handles real-world programming tasks including front-end web page generation and integrates with coding tools such as Claude Code, Cline, Roo Code, and Kilo Code.
Built for multi-step agent pipelines, performing well on tool-use benchmarks and integrating into agent frameworks for automated task execution.
Natively supports both English and Chinese, making it suitable for bilingual applications and cross-language document processing.
Produces extended written content and handles role-playing scenarios, with outputs tuned toward human-preferred writing style and coherence.
Uses a Mixture-of-Experts design with approximately 357 billion total parameters, allowing selective activation of model capacity per token during inference.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
Official model cards, release notes, docs, and other references synced from the source page.
GLM 4.6 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/NovelAi. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 1188 upvotes and 178 comments.
I wonder if anyone else is like me, and still using GLM 4.6?
After hearing how both explicitly and surreptitiously GLM 4.7 and 5.0 are with censorship, I don't find myself wanting to stray from 4.6 with how uncensored and jack of all trades it is.
Anyone else notcing GLM beeing really bad the last week?
It keeps using \[Location: X\]
It repeats itself even more, even if i set the repeat penalties really high.
It forgets relationships between characters alot.
It forgets characters alltogether if they havent spoken or been mentioned for a while.
this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6
even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me
maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success
Hi there, I’ve read some great things about GLM 4.6. I’ve decided to give it a go last night and man, am I frustrated.
The constant “devilish smirk, dangerous grin, predatory laugh”. Constantly repeating my phrases. Responding to each sentence of my response, piece by piece. Giant, long essays of text. I do have prompts to try and counter these things, but none work.
It’s also weird in how it’ll randomly drop Chinese letters in responses, sometimes just not generate past the think, and doesn’t work well with a prefill. What’s the secret sauce? Am I just too slop-annoyed? I am using a direct API and regular settings.
Long story short, I only want to run local models. I hear many good things of 4.6, but is far too large to run locally. 4.6V-flash would fit on my GPU. How do the models compare in roleplaying?
GLM-4.6 supports a context window of 200,000 tokens, which is approximately 150,000 words. This allows it to process long documents, large codebases, or extended conversation histories in a single request.
GLM-4.6 is released under the MIT license, which permits free use for both commercial and personal projects without royalty obligations.
According to the model metadata, GLM-4.6 has a training data cutoff of September 2025.
GLM-4.6 is built on a Mixture-of-Experts architecture with approximately 357 billion total parameters. MoE models activate only a subset of parameters per token during inference.
GLM-4.6 natively supports both English and Chinese, making it suitable for bilingual use cases and applications targeting users in either language.
GLM-4.6 is designed for complex coding tasks, long-document analysis, agentic AI workflows that require tool use during reasoning, and bilingual English/Chinese applications.
Continue browsing adjacent models from the same provider.