Z.ai

GLM 5

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance. GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Feb 11, 2026 202.8K context 16,384 tokens output
Long-Context Processing Complex Reasoning Autonomous Coding Agentic Task Execution Mixture-of-Experts Architecture Reinforcement Learning Alignment

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-5

Input Context Window

The number of tokens supported by the input context window.

202.8K tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Feb 11, 2026 3 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

February 2026

API Providers

The providers that offer this model. This is not an exhaustive list.

GMICloud, DeepInfra, StreamLake, Baidu, SiliconFlow, Chutes, AtlasCloud, Amazon Bedrock, Friendli, Novita, Z.AI, Parasail, Together, Venice, Phala

Modalities

Types of data this model can process.

Text

What is GLM 5

A fuller summary of positioning, capabilities, and source-specific details for GLM 5.

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.

GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Capabilities

What GLM 5 supports

CTX

Long-Context Processing

Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.

RN

Complex Reasoning

Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.

</>

Autonomous Coding

Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.

AG

Agentic Task Execution

Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.

AI

Mixture-of-Experts Architecture

Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.

AI

Reinforcement Learning Alignment

Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.

AI

Text Generation

Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.

Pricing for GLM 5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.12
maxTemperature 1
maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

GMICloud DeepInfra StreamLake Baidu SiliconFlow Chutes AtlasCloud Amazon Bedrock Friendli Novita Z.AI Parasail Together Venice Phala

Provider Endpoints

Endpoint-level provider data currently available for this model.

GMICloud

1d uptime: 90.9% Supported params: 10 Implicit caching: No

DeepInfra

Max output: 16,384 1d uptime: 100.0% Supported params: 17 Implicit caching: No

StreamLake

Max output: 128,000 1d uptime: 99.6% Supported params: 9 Implicit caching: No

Baidu

Max output: 131,072 1d uptime: 99.6% Supported params: 14 Implicit caching: Yes

SiliconFlow

Max output: 131,072 1d uptime: 97.7% Supported params: 9 Implicit caching: No

Chutes

Max output: 65,535 1d uptime: 89.8% Supported params: 15 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.6% Supported params: 17 Implicit caching: No

Amazon Bedrock

Max output: 131,072 1d uptime: 94.8% Supported params: 9 Implicit caching: No

Friendli

Max output: 202,752 1d uptime: 100.0% Supported params: 16 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 100.0% Supported params: 13 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 99.8% Supported params: 8 Implicit caching: No

Parasail

Max output: 131,072 1d uptime: 99.6% Supported params: 16 Implicit caching: No

Together

1d uptime: 99.1% Supported params: 16 Implicit caching: No

Venice

Max output: 32,000 1d uptime: 94.8% Supported params: 13 Implicit caching: No

Phala

Max output: 202,752 1d uptime: 82.3% Supported params: 16 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group
Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
BrowseComp
Complex web browsing and information retrieval
75.9%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
82.0%
HLE
Questions that challenge frontier models across many domains
27.2%
SciCode
Scientific research coding and numerical methods
46.2%
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
77.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about GLM 5

GLM 5 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/opencodeCLI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 4664 upvotes and 361 comments.

r/SillyTavernAI 69 upvotes 60 comments May 7, 2026
Glm 5.1 is really good. Like insanely better than opus 4.6

Hello, I’ve been using Glm 5.1 for a good hour and I used the freaky frankenstien preset and the dialogues are amazing. Pure realistic and human-like dialogue.

I did tried it with claude opus 4.6/4.7 but I didn’t really enjoy the dialogue, the details are good but overall? I enjoy glm 5.1 very much.

All you need is a few nudges and its like opus. Its amazing.

Do you agree?

Open Reddit thread
r/SillyTavernAI 26 upvotes 26 comments March 28, 2026
How is GLM 5?

asking because maybe Xi jinping may have given me an alternative to Claude

Open Reddit thread
r/SillyTavernAI 62 upvotes 57 comments May 3, 2026
Deepseek v4 or GLM 5.1?

Which one are you currently using more? And why? I’m kinda torn between both of them, I have kinda grown to like DS v4 more than GLM 5.1, what is your opinion?

Open Reddit thread
r/SillyTavernAI 14 upvotes 25 comments May 1, 2026
Kimi 2.6 and GLM 5.1 are problematic.

I got a question, so everytime I use Kimi 2.6, it thinks for so long even if I give it like 5k tokens. Glm 5.1 On the other hand has some issues for some reason. It either gives a coherent response or it just gives a nonsensical response and never stops. Does anyone else have these issues?

Open Reddit thread
r/PaxHistoria 29 upvotes 20 comments April 23, 2026
I hate GLM 5 so much 😭😭

Actions: Task our engineers with creating a new steel alloy that can support 5% times more load compared to traditional steel.

GLM 5: The research fails so utterly the entire engineering team spontaneously combusts, and all the steel mills simultaneously shit themselves, reducing out put by 99.7%.

Also the Germans attack.

Open Reddit thread
View more discussions →
FAQ

Common questions about GLM 5

What is the context window for GLM-5?

GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.

How many parameters does GLM-5 have?

GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.

What is the training data cutoff for GLM-5?

Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.

What license does GLM-5 use?

GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.

What hardware was GLM-5 trained on?

GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.

What tasks is GLM-5 best suited for?

GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models