Z.ai

GLM 5

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance. GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Feb 11, 2026 202.8K context 16,384 tokens output

Long-Context Processing Complex Reasoning Autonomous Coding Agentic Task Execution Mixture-of-Experts Architecture Reinforcement Learning Alignment

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Parameters ↓ Benchmarks ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-5

Input Context Window

The number of tokens supported by the input context window.

202.8K tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Feb 11, 2026 4 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

February 2026

API Providers

The providers that offer this model. This is not an exhaustive list.

GMICloud, DeepInfra, StreamLake, Baidu, DigitalOcean, SiliconFlow, Chutes, AtlasCloud, Amazon Bedrock, Novita, Z.AI, Parasail, Venice, Phala

Modalities

Types of data this model can process.

Text

What is GLM 5

A fuller summary of positioning, capabilities, and source-specific details for GLM 5.

GLM-5 is a 744-billion-parameter Mixture-of-Experts language model developed by Z.ai (formerly Zhipu AI), released in February 2026 under the MIT license. It activates 40 billion parameters per token and supports a 200,000-token context window, making it suited for tasks that require processing large volumes of text in a single pass. The model was pre-trained on 28.5 trillion tokens and incorporates DeepSeek Sparse Attention to reduce inference costs while maintaining long-context performance.

GLM-5 is designed primarily for agentic workflows, autonomous software engineering, tool use, and long-horizon planning tasks. A notable aspect of its development is that it was trained entirely on Huawei Ascend chips using the MindSpore framework, with no dependency on NVIDIA hardware. It also introduces an asynchronous reinforcement learning training system called slime, which improves training throughput and enables more fine-grained post-training alignment. The model is freely available for both research and commercial use under its MIT license.

Capabilities

What GLM 5 supports

CTX

Long-Context Processing

Handles inputs up to 200,000 tokens in a single context window, enabling analysis of large codebases, documents, or multi-turn conversation histories.

Complex Reasoning

Applies multi-step reasoning across math, science, and logic tasks, scoring 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond benchmarks.

</>

Autonomous Coding

Executes software engineering tasks end-to-end, achieving 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual.

Agentic Task Execution

Supports long-horizon agentic workflows including tool use, web research, and multi-step planning across extended task sequences.

Mixture-of-Experts Architecture

Uses a sparse MoE design with 744B total parameters but only 40B active per token, reducing compute cost per inference call.

Reinforcement Learning Alignment

Post-trained using the asynchronous slime RL infrastructure, which improves training throughput and fine-grained alignment beyond standard pre-training.

Text Generation

Generates structured and unstructured text outputs for tasks including summarization, drafting, and question answering across multiple languages.

Pricing for GLM 5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.80 Per million tokens

Output tokens $1.92 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.12

maxTemperature 1

maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

GMICloud DeepInfra StreamLake Baidu DigitalOcean SiliconFlow Chutes AtlasCloud Amazon Bedrock Novita Z.AI Parasail Venice Phala

Provider Endpoints

Endpoint-level provider data currently available for this model.

GMICloud

1d uptime: 98.5% Supported params: 10 Implicit caching: No

DeepInfra

Max output: 16,384 1d uptime: 99.9% Supported params: 17 Implicit caching: No

StreamLake

Max output: 128,000 1d uptime: 99.9% Supported params: 13 Implicit caching: No

Baidu

Max output: 131,072 1d uptime: 99.8% Supported params: 14 Implicit caching: Yes

DigitalOcean

1d uptime: 82.2% Supported params: 11 Implicit caching: No

SiliconFlow

Max output: 131,072 1d uptime: 100.0% Supported params: 9 Implicit caching: No

Chutes

Max output: 65,535 1d uptime: 79.8% Supported params: 15 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.8% Supported params: 17 Implicit caching: No

Amazon Bedrock

Max output: 131,072 1d uptime: 94.8% Supported params: 9 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 100.0% Supported params: 13 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 93.9% Supported params: 8 Implicit caching: No

Parasail

Max output: 131,072 1d uptime: 96.4% Supported params: 18 Implicit caching: No

Venice

Max output: 32,000 1d uptime: 72.6% Supported params: 13 Implicit caching: No

Phala

Max output: 202,752 1d uptime: 84.2% Supported params: 16 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group

Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
BrowseComp Complex web browsing and information retrieval	75.9%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	82.0%
HLE Questions that challenge frontier models across many domains	27.2%
SciCode Scientific research coding and numerical methods	46.2%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes	77.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Model Card (Hugging Face) Other

→

GitHub Repository Open Source

→

Official Announcement Blog Post Announcements

→

API Documentation Documentation

→

Technical Paper Research

→

OpenRouter Model Page OpenRouter

→

AI tools related to GLM 5

These tools are strongly connected to GLM 5 through direct product references, provider mentions, or explicit model mappings.

AI Chatbot

智谱清言

智谱清言 is a Chinese-language conversational AI developed by Zhipu AI, powered by the GLM large language model. It features capabilities including AI-driven search, image generation, document reading, and automated video and presentation creation. Additionally, it provides tools for data analysis, coding assistance, and a library of intelligent agents, including support for building custom agents.

Free 4 visits 3 saves

AI Assistant

Shmooz AI

Shmooz AI is an accessible AI assistant available on both WhatsApp and the web. It provides features such as image generation, real-time Google search integration, article summarization, and file interaction, aiming to deliver high-quality AI model capabilities across these platforms.

Free 13 visits 2 saves

AI Chatbot

Polybuzz AI

Polybuzz AI is a platform designed for creating and interacting with AI-powered characters for role-playing, storytelling, and creative dialogue. The service hosts over 20 million characters across genres such as anime, fantasy, and horror. Users can build custom AI characters, participate in secure chats, and utilize creative tools including free image generation. The platform provides immersive roleplay scenarios and includes customizable content filters to maintain a safe user environment.

Free 0 visits 1 saves

AI Assistant

Snoooz AI

Snoooz AI is an automated Out-of-Office (OOO) assistant designed to streamline email management. It handles personalized OOO replies, creates backups for urgent communications, and manages email categorization and routing. The tool is designed to help professionals and businesses enhance prospect engagement, customer success, and employee experience.

Free 9 visits

Related Daily Briefs

Recent daily stories tied to GLM 5 through direct model mentions or provider-level coverage.

Frontier Models

GLM 5.2 Challenges Frontier Models as Open Weights Rival Top AI and Subscription Costs Face Scrutiny

Hugging Face and OpenAI are raising the stakes for enterprise adoption.

2026-06-20 AI Models AI Chatbot

Community discussion

What people think about GLM 5

GLM 5 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/opencodeCLI. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 4664 upvotes and 361 comments.

r/SillyTavernAI 69 upvotes 60 comments May 7, 2026

Glm 5.1 is really good. Like insanely better than opus 4.6

Hello, I’ve been using Glm 5.1 for a good hour and I used the freaky frankenstien preset and the dialogues are amazing. Pure realistic and human-like dialogue.

I did tried it with claude opus 4.6/4.7 but I didn’t really enjoy the dialogue, the details are good but overall? I enjoy glm 5.1 very much.

All you need is a few nudges and its like opus. Its amazing.

Do you agree?

Open Reddit thread

r/SillyTavernAI 26 upvotes 26 comments March 28, 2026

How is GLM 5?

asking because maybe Xi jinping may have given me an alternative to Claude

Open Reddit thread

r/SillyTavernAI 62 upvotes 57 comments May 3, 2026

Deepseek v4 or GLM 5.1?

Which one are you currently using more? And why? I’m kinda torn between both of them, I have kinda grown to like DS v4 more than GLM 5.1, what is your opinion?

Open Reddit thread

r/SillyTavernAI 14 upvotes 25 comments May 1, 2026

Kimi 2.6 and GLM 5.1 are problematic.

I got a question, so everytime I use Kimi 2.6, it thinks for so long even if I give it like 5k tokens. Glm 5.1 On the other hand has some issues for some reason. It either gives a coherent response or it just gives a nonsensical response and never stops. Does anyone else have these issues?

Open Reddit thread

r/PaxHistoria 29 upvotes 20 comments April 23, 2026

I hate GLM 5 so much 😭😭

Actions: Task our engineers with creating a new steel alloy that can support 5% times more load compared to traditional steel.

GLM 5: The research fails so utterly the entire engineering team spontaneously combusts, and all the steel mills simultaneously shit themselves, reducing out put by 99.7%.

Also the Germans attack.

Open Reddit thread

View more discussions →

FAQ

Common questions about GLM 5

What is the context window for GLM-5?

GLM-5 supports a 200,000-token context window, allowing it to process large documents, long codebases, or extended multi-turn conversations in a single pass.

How many parameters does GLM-5 have?

GLM-5 is a Mixture-of-Experts model with 744 billion total parameters. It activates 40 billion parameters per token during inference, which reduces the compute cost relative to a dense model of the same total size.

What is the training data cutoff for GLM-5?

Based on the available metadata, GLM-5 has a training date of February 2026. A precise knowledge cutoff date is not specified in the provided metadata.

What license does GLM-5 use?

GLM-5 is released under the MIT license, which permits both research and commercial use without royalty obligations.

What hardware was GLM-5 trained on?

GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework. It has no dependency on NVIDIA hardware, making it notable as a large-scale model trained on China's domestic AI compute infrastructure.

What tasks is GLM-5 best suited for?

GLM-5 is designed for agentic workflows, autonomous software engineering, tool use, web research, and long-horizon planning tasks. It also performs well on advanced mathematics and graduate-level science reasoning based on its benchmark results.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models