Long-Context Processing
Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.
GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance. GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for GLM 5.
GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.
GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.
Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.
Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.
Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.
Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.
Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.
Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.
Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
BrowseComp
Complex web browsing and information retrieval
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
SciCode
Scientific research coding and numerical methods
|
|
|
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
|
Official model cards, release notes, docs, and other references synced from the source page.
GLM 5 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/opencodeCLI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 4664 upvotes and 361 comments.
Hello, I’ve been using Glm 5.1 for a good hour and I used the freaky frankenstien preset and the dialogues are amazing. Pure realistic and human-like dialogue.
I did tried it with claude opus 4.6/4.7 but I didn’t really enjoy the dialogue, the details are good but overall? I enjoy glm 5.1 very much.
All you need is a few nudges and its like opus. Its amazing.
Do you agree?
asking because maybe Xi jinping may have given me an alternative to Claude
Which one are you currently using more? And why? I’m kinda torn between both of them, I have kinda grown to like DS v4 more than GLM 5.1, what is your opinion?
I got a question, so everytime I use Kimi 2.6, it thinks for so long even if I give it like 5k tokens. Glm 5.1 On the other hand has some issues for some reason. It either gives a coherent response or it just gives a nonsensical response and never stops. Does anyone else have these issues?
Actions: Task our engineers with creating a new steel alloy that can support 5% times more load compared to traditional steel.
GLM 5: The research fails so utterly the entire engineering team spontaneously combusts, and all the steel mills simultaneously shit themselves, reducing out put by 99.7%.
Also the Germans attack.
GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.
GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.
Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.
GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.
GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.
GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.
Continue browsing adjacent models from the same provider.