Z.ai

GLM 4.6

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series. GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Sep 30, 2025 200,000 context 16,384 tokens output
Extended Context Window Tool-Use Reasoning Code Generation Agentic Workflows Bilingual Language Support Long-Form Text Generation

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-4.6

Input Context Window

The number of tokens supported by the input context window.

200,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Sep 30, 2025 8 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2025-03-31

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, Novita, Z.AI, AtlasCloud, Venice

Modalities

Types of data this model can process.

Text

What is GLM 4.6

A fuller summary of positioning, capabilities, and source-specific details for GLM 4.6.

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series.

GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Capabilities

What GLM 4.6 supports

CTX

Extended Context Window

Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.

TL

Tool-Use Reasoning

Supports tool calling during the reasoning process itself, allowing the model to query APIs or search for information while thinking through a problem rather than only after.

</>

Code Generation

Handles real-world programming tasks including front-end web page generation and integrates with coding tools such as Claude Code, Cline, Roo Code, and Kilo Code.

AG

Agentic Workflows

Built for multi-step agent pipelines, performing well on tool-use benchmarks and integrating into agent frameworks for automated task execution.

AI

Bilingual Language Support

Natively supports both English and Chinese, making it suitable for bilingual applications and cross-language document processing.

AI

Long-Form Text Generation

Produces extended written content and handles role-playing scenarios, with outputs tuned toward human-preferred writing style and coherence.

AI

MoE Architecture

Uses a Mixture-of-Experts design with approximately 357 billion total parameters, allowing selective activation of model capacity per token during inference.

Pricing for GLM 4.6

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.08
maxTemperature 1
maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra Novita Z.AI AtlasCloud Venice

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 131,072 1d uptime: 99.9% Supported params: 16 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 97.9% Supported params: 14 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 90.9% Supported params: 8 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.9% Supported params: 17 Implicit caching: No

Venice

Max output: 16,384 1d uptime: 99.8% Supported params: 13 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group
Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
63.2%
HLE
Questions that challenge frontier models across many domains
5.2%
LiveCodeBench
Real-world coding tasks from recent competitions
56.1%
MMLU-Pro
Expert knowledge across 14 academic disciplines
78.4%
SciCode
Scientific research coding and numerical methods
33.1%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about GLM 4.6

GLM 4.6 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/NovelAi. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 1188 upvotes and 178 comments.

r/SillyTavernAI 14 upvotes 27 comments February 20, 2026
Still using GLM 4.6?

I wonder if anyone else is like me, and still using GLM 4.6?

After hearing how both explicitly and surreptitiously GLM 4.7 and 5.0 are with censorship, I don't find myself wanting to stray from 4.6 with how uncensored and jack of all trades it is.

Open Reddit thread
r/NovelAi 19 upvotes 23 comments March 22, 2026
GLM 4.6 really bad now?

Anyone else notcing GLM beeing really bad the last week?
It keeps using \[Location: X\]
It repeats itself even more, even if i set the repeat penalties really high.
It forgets relationships between characters alot.
It forgets characters alltogether if they havent spoken or been mentioned for a while.

Open Reddit thread
r/SillyTavernAI 18 upvotes 24 comments January 7, 2026
glm 4.6 is still incredibly better than glm 4.7

this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6

even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me

maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success

Open Reddit thread
r/SillyTavernAI 61 upvotes 26 comments October 29, 2025
Please help me de-slop GLM 4.6

Hi there, I’ve read some great things about GLM 4.6. I’ve decided to give it a go last night and man, am I frustrated.

The constant “devilish smirk, dangerous grin, predatory laugh”. Constantly repeating my phrases. Responding to each sentence of my response, piece by piece. Giant, long essays of text. I do have prompts to try and counter these things, but none work.

It’s also weird in how it’ll randomly drop Chinese letters in responses, sometimes just not generate past the think, and doesn’t work well with a prefill. What’s the secret sauce? Am I just too slop-annoyed? I am using a direct API and regular settings.

Open Reddit thread
r/SillyTavernAI 8 upvotes 18 comments December 26, 2025
How does GLM 4.6V Flash compare to 4.6?

Long story short, I only want to run local models. I hear many good things of 4.6, but is far too large to run locally. 4.6V-flash would fit on my GPU. How do the models compare in roleplaying?

Open Reddit thread
View more discussions →
FAQ

Common questions about GLM 4.6

What is the context window size for GLM-4.6?

GLM-4.6 supports a context window of 200,000 tokens, which is approximately 150,000 words. This allows it to process long documents, large codebases, or extended conversation histories in a single request.

What license does GLM-4.6 use?

GLM-4.6 is released under the MIT license, which permits free use for both commercial and personal projects without royalty obligations.

What is the knowledge cutoff date for GLM-4.6?

According to the model metadata, GLM-4.6 has a training data cutoff of September 2025.

How many parameters does GLM-4.6 have?

GLM-4.6 is built on a Mixture-of-Experts architecture with approximately 357 billion total parameters. MoE models activate only a subset of parameters per token during inference.

What languages does GLM-4.6 support?

GLM-4.6 natively supports both English and Chinese, making it suitable for bilingual use cases and applications targeting users in either language.

What kinds of tasks is GLM-4.6 best suited for?

GLM-4.6 is designed for complex coding tasks, long-document analysis, agentic AI workflows that require tool use during reasoning, and bilingual English/Chinese applications.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models