Z.ai

GLM 4.6

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series. GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Sep 30, 2025 200,000 context 16,384 tokens output

Extended Context Window Tool-Use Reasoning Code Generation Agentic Workflows Bilingual Language Support Long-Form Text Generation

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Parameters ↓ Benchmarks ↓ Tools ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-4.6

Input Context Window

The number of tokens supported by the input context window.

200,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Sep 30, 2025 9 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

2025-03-31

API Providers

The providers that offer this model. This is not an exhaustive list.

Venice, DeepInfra, Novita, Z.AI, AtlasCloud

Modalities

Types of data this model can process.

Text

What is GLM 4.6

A fuller summary of positioning, capabilities, and source-specific details for GLM 4.6.

GLM-4.6 is a large language model developed by Zhipu AI (Z.ai), built on a Mixture-of-Experts architecture with approximately 357 billion parameters. It supports both English and Chinese, carries a 200,000-token context window, and is released under the MIT license, making it available for commercial and personal use without restrictions. The model was released in late 2025 and represents Zhipu AI's flagship offering in the GLM series.

GLM-4.6 is designed for tasks that require extended context handling, multi-step reasoning, and agentic workflows. A notable characteristic is its ability to invoke tools during the reasoning process itself — not only after completing a chain of thought — which enables more dynamic problem-solving in agent-based applications. It is well suited for developers and researchers working on complex coding tasks, long-document analysis, bilingual applications, and automated multi-step pipelines.

Capabilities

What GLM 4.6 supports

CTX

Extended Context Window

Processes up to 200,000 tokens in a single request, equivalent to roughly 150,000 words, enabling analysis of long documents and large codebases without losing earlier context.

Tool-Use Reasoning

Supports tool calling during the reasoning process itself, allowing the model to query APIs or search for information while thinking through a problem rather than only after.

</>

Code Generation

Handles real-world programming tasks including front-end web page generation and integrates with coding tools such as Claude Code, Cline, Roo Code, and Kilo Code.

Agentic Workflows

Built for multi-step agent pipelines, performing well on tool-use benchmarks and integrating into agent frameworks for automated task execution.

Bilingual Language Support

Natively supports both English and Chinese, making it suitable for bilingual applications and cross-language document processing.

Long-Form Text Generation

Produces extended written content and handles role-playing scenarios, with outputs tuned toward human-preferred writing style and coherence.

MoE Architecture

Uses a Mixture-of-Experts design with approximately 357 billion total parameters, allowing selective activation of model capacity per token during inference.

Pricing for GLM 4.6

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.43 Per million tokens

Output tokens $1.74 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.08

maxTemperature 1

maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

Venice DeepInfra Novita Z.AI AtlasCloud

Provider Endpoints

Endpoint-level provider data currently available for this model.

Venice

Max output: 16,384 1d uptime: 98.7% Supported params: 13 Implicit caching: No

DeepInfra

Max output: 131,072 1d uptime: 99.9% Supported params: 16 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 97.1% Supported params: 14 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 95.8% Supported params: 8 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.5% Supported params: 17 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group

Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	63.2%
HLE Questions that challenge frontier models across many domains	5.2%
LiveCodeBench Real-world coding tasks from recent competitions	56.1%
MMLU-Pro Expert knowledge across 14 academic disciplines	78.4%
SciCode Scientific research coding and numerical methods	33.1%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Model Card (Hugging Face) Documentation

→

Announcement Blog Post Announcements

→

Technical Report Research

→

API Reference Documentation

→

GitHub Repository Open Source

→

OpenRouter Model Page OpenRouter

→

AI tools related to GLM 4.6

These tools are strongly connected to GLM 4.6 through direct product references, provider mentions, or explicit model mappings.

AI Chatbot

智谱清言

智谱清言 is a Chinese-language conversational AI developed by Zhipu AI, powered by the GLM large language model. It features capabilities including AI-driven search, image generation, document reading, and automated video and presentation creation. Additionally, it provides tools for data analysis, coding assistance, and a library of intelligent agents, including support for building custom agents.

Free 4 visits 3 saves

AI Assistant

Shmooz AI

Shmooz AI is an accessible AI assistant available on both WhatsApp and the web. It provides features such as image generation, real-time Google search integration, article summarization, and file interaction, aiming to deliver high-quality AI model capabilities across these platforms.

Free 13 visits 2 saves

AI Chatbot

Polybuzz AI

Polybuzz AI is a platform designed for creating and interacting with AI-powered characters for role-playing, storytelling, and creative dialogue. The service hosts over 20 million characters across genres such as anime, fantasy, and horror. Users can build custom AI characters, participate in secure chats, and utilize creative tools including free image generation. The platform provides immersive roleplay scenarios and includes customizable content filters to maintain a safe user environment.

Free 0 visits 1 saves

AI Assistant

Snoooz AI

Snoooz AI is an automated Out-of-Office (OOO) assistant designed to streamline email management. It handles personalized OOO replies, creates backups for urgent communications, and manages email categorization and routing. The tool is designed to help professionals and businesses enhance prospect engagement, customer success, and employee experience.

Free 9 visits

Community discussion

What people think about GLM 4.6

GLM 4.6 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/NovelAi. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 1188 upvotes and 178 comments.

r/SillyTavernAI 14 upvotes 27 comments February 20, 2026

Still using GLM 4.6?

I wonder if anyone else is like me, and still using GLM 4.6?

After hearing how both explicitly and surreptitiously GLM 4.7 and 5.0 are with censorship, I don't find myself wanting to stray from 4.6 with how uncensored and jack of all trades it is.

Open Reddit thread

r/NovelAi 19 upvotes 23 comments March 22, 2026

GLM 4.6 really bad now?

Anyone else notcing GLM beeing really bad the last week?
It keeps using \[Location: X\]
It repeats itself even more, even if i set the repeat penalties really high.
It forgets relationships between characters alot.
It forgets characters alltogether if they havent spoken or been mentioned for a while.

Open Reddit thread

r/SillyTavernAI 18 upvotes 24 comments January 7, 2026

glm 4.6 is still incredibly better than glm 4.7

this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6

even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me

maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success

Open Reddit thread

r/SillyTavernAI 61 upvotes 26 comments October 29, 2025

Please help me de-slop GLM 4.6

Hi there, I’ve read some great things about GLM 4.6. I’ve decided to give it a go last night and man, am I frustrated.

The constant “devilish smirk, dangerous grin, predatory laugh”. Constantly repeating my phrases. Responding to each sentence of my response, piece by piece. Giant, long essays of text. I do have prompts to try and counter these things, but none work.

It’s also weird in how it’ll randomly drop Chinese letters in responses, sometimes just not generate past the think, and doesn’t work well with a prefill. What’s the secret sauce? Am I just too slop-annoyed? I am using a direct API and regular settings.

Open Reddit thread

r/SillyTavernAI 8 upvotes 18 comments December 26, 2025

How does GLM 4.6V Flash compare to 4.6?

Long story short, I only want to run local models. I hear many good things of 4.6, but is far too large to run locally. 4.6V-flash would fit on my GPU. How do the models compare in roleplaying?

Open Reddit thread

View more discussions →

FAQ

Common questions about GLM 4.6

What is the context window size for GLM-4.6?

GLM-4.6 supports a context window of 200,000 tokens, which is approximately 150,000 words. This allows it to process long documents, large codebases, or extended conversation histories in a single request.

What license does GLM-4.6 use?

GLM-4.6 is released under the MIT license, which permits free use for both commercial and personal projects without royalty obligations.

What is the knowledge cutoff date for GLM-4.6?

According to the model metadata, GLM-4.6 has a training data cutoff of September 2025.

How many parameters does GLM-4.6 have?

GLM-4.6 is built on a Mixture-of-Experts architecture with approximately 357 billion total parameters. MoE models activate only a subset of parameters per token during inference.

What languages does GLM-4.6 support?

GLM-4.6 natively supports both English and Chinese, making it suitable for bilingual use cases and applications targeting users in either language.

What kinds of tasks is GLM-4.6 best suited for?

GLM-4.6 is designed for complex coding tasks, long-document analysis, agentic AI workflows that require tool use during reasoning, and bilingual English/Chinese applications.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models