Z.ai

GLM 5.1

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

Apr 07, 2026 202.8K context 16,384 tokens output
Text Tools Structured Output Reasoning

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-5.1

Input Context Window

The number of tokens supported by the input context window.

202.8K tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

Yes

Release Date

When the model was first released.

Apr 07, 2026 2 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

GMICloud, Baidu, DeepInfra, StreamLake, Chutes, Phala, AtlasCloud, BaseTen, Novita, Together, Parasail, Fireworks, Z.AI, SiliconFlow, Ambient, Friendli, Inceptron, Venice

Modalities

Types of data this model can process.

Text

What is GLM 5.1

A fuller summary of positioning, capabilities, and source-specific details for GLM 5.1.

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

Capabilities

What GLM 5.1 supports

RN

Reasoning Controls

OpenRouter lists GPT-5.5 with reasoning support and explicit reasoning-related request parameters.

JSON

Structured Outputs

Structured output settings are exposed through OpenRouter for schema-driven or format-controlled responses.

TL

Tool Calling

Tool invocation and tool selection are supported in the routed OpenRouter interface for this model.

MM

Multimodal I/O

This model accepts text input and returns text output.

CTX

Large Context Window

OpenRouter currently lists a context window of 202.8K with up to 16,384 tokens maximum output tokens.

Pricing for GLM 5.1

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.18
maxTemperature 1
maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

GMICloud Baidu DeepInfra StreamLake Chutes Phala AtlasCloud BaseTen Novita Together Parasail Fireworks Z.AI SiliconFlow Ambient Friendli Inceptron Venice

Provider Endpoints

Endpoint-level provider data currently available for this model.

GMICloud

1d uptime: 87.5% Supported params: 6 Implicit caching: No

Baidu

Max output: 131,072 1d uptime: 99.6% Supported params: 9 Implicit caching: No

DeepInfra

Max output: 32,768 1d uptime: 99.9% Supported params: 17 Implicit caching: No

StreamLake

Max output: 128,000 1d uptime: 99.8% Supported params: 9 Implicit caching: No

Chutes

Max output: 65,535 1d uptime: 86.6% Supported params: 15 Implicit caching: No

Phala

Max output: 202,752 1d uptime: 94.1% Supported params: 16 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.4% Supported params: 17 Implicit caching: No

BaseTen

Max output: 202,800 1d uptime: 91.0% Supported params: 11 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 99.8% Supported params: 13 Implicit caching: No

Together

1d uptime: 84.7% Supported params: 14 Implicit caching: No

Parasail

Max output: 131,072 1d uptime: 99.3% Supported params: 16 Implicit caching: No

Fireworks

1d uptime: 99.2% Supported params: 15 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 99.0% Supported params: 8 Implicit caching: No

SiliconFlow

Max output: 131,072 1d uptime: 99.9% Supported params: 9 Implicit caching: No

Ambient

Max output: 131,072 1d uptime: 99.1% Supported params: 10 Implicit caching: No

Friendli

Max output: 202,752 1d uptime: 99.9% Supported params: 16 Implicit caching: No

Inceptron

Max output: 202,752 1d uptime: 94.3% Supported params: 17 Implicit caching: No

Venice

Max output: 24,000 1d uptime: 99.0% Supported params: 13 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group
Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Compare GLM 5.1 with related models

Jump straight into the most relevant side-by-side comparison pages for this model.

GLM 5.1 vs Mistral Small 3.1 (25.03)

Compare GLM 5.1 and Mistral Small 3.1 (25.03) across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus cost-efficient scale.

GLM 5.1 vs Mistral Medium 3

Compare GLM 5.1 and Mistral Medium 3 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus tool-augmented workflows.

GLM 5.1 vs Kimi K2.6

Compare GLM 5.1 and Kimi K2.6 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus reasoning-heavy tasks.

DeepSeek V4 Flash vs GLM 5.1

Compare DeepSeek V4 Flash and GLM 5.1 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus reasoning-heavy tasks.

GLM 5.1 vs Claude 4.6 Opus

Compare GLM 5.1 and Claude 4.6 Opus across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for reasoning-heavy tasks versus long-context workloads.

Grok 4.3 vs GLM 5.1

Compare Grok 4.3 and GLM 5.1 across pricing, context window, capabilities, benchmarks, and API access to choose the better fit for long-context workloads versus reasoning-heavy tasks.

Community discussion

What people think about GLM 5.1

GLM 5.1 discussions are most active in r/SillyTavernAI, r/ZaiGLM, r/LocalLLaMA. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 4673 upvotes and 361 comments.

r/SillyTavernAI 69 upvotes 60 comments May 7, 2026
Glm 5.1 is really good. Like insanely better than opus 4.6

Hello, I’ve been using Glm 5.1 for a good hour and I used the freaky frankenstien preset and the dialogues are amazing. Pure realistic and human-like dialogue.

I did tried it with claude opus 4.6/4.7 but I didn’t really enjoy the dialogue, the details are good but overall? I enjoy glm 5.1 very much.

All you need is a few nudges and its like opus. Its amazing.

Do you agree?

Open Reddit thread
r/SillyTavernAI 62 upvotes 57 comments May 3, 2026
Deepseek v4 or GLM 5.1?

Which one are you currently using more? And why? I’m kinda torn between both of them, I have kinda grown to like DS v4 more than GLM 5.1, what is your opinion?

Open Reddit thread
r/SillyTavernAI 109 upvotes 27 comments April 2, 2026
Recommended GLM 5.1 Settings

**Glm 5.1 Direct API/Coding Plan, Chat Completion, Silly Tavern**

I don't use any extensions, so not sure how much that would factor into these.

These might become irrelevant in a week, but otherwise: follow what your preset creator recommends, they know the quirks of their preset best. If you're making your own prompts and not sure, continue on...

\---

**PROMPT POST-PROCESSING**

* **Merge/None** = garbage, but may depend on your setup. There's always someone saying this work best for them somehow.
* **Single User** = more creative; *sometimes* better prose (with a bit of slop) & coherence (sometimes worse), but less prompt adherence. More prone to rescue the user without aggressive prompting. ***May not work great for larger (3k+) / complicated presets.***
* **Semi Strict/Strict** = follows prompts better. Use if the preset is on the larger size / you're peculiar about things. (As GLM fluctuates during this period, occasionally this may actually be less coherent or too stiff.)

**SAMPLERS**

* **Temp:** .60 to .80; above .80 might get Chinese characters / become incoherent.
* Feels too stiff? Go higher. Dumb? Go lower.
* I feel like the higher end is usually fine if you play with contemporary/colloquial language.
* **Top P:** .95 most coherent, stable sweet spot.
* .99 - 1.0 too dumb
* .96 - .98: lively, but can have coherency issues, deictic misalignment, more prone to omniscience.
* Note on .97+: not that GLM is reserved in cussing, but it cusses more freely when this is higher if you have a cussing prompt.
* **Everything else:** default / zero.

**REASONING**

Auto felt like roulette. I go with high for consistency.

\---

**"CENSORSHIP"**

With a simple jailbreak (or overwhelming it with a large preset), it will do anything.

You *may* have difficulty getting questions about Taiwan's legitimacy and Tiananmen Square through, but that's about it.

For the masochists...

* Single User: needs aggressive prompting / regens.
* Semi Strict: easier time getting it to hurt user / occasional regen.
* Strict: more proactive about hurting user.

\---

**DEPTH 1 PROMPTS**

Depends on your setup, but if it seems to have trouble remembering the last message and it's not a peak hour, try changing the depth of the prompt if it's set at 1.

**DO\_SAMPLE**

This doesn't do anything. Get rid of it.

\---

**EVEN IF YOU'RE IMPRESSED BY 5.1, DO NOT BUY A SUBSCRIPTION FROM THEM.**

Once it's fully released, you can probably find better providers for it elsewhere. I'm on a max legacy year plan and even I get hit with it shitting the bed. Don't get too attached; a lot of models, not just Zai, are great when they first come out.

Open Reddit thread
r/vibecoding 1 upvotes 11 comments May 1, 2026
GLM 5.1 is crazy good opus or openai not even close to this thing!

its been running for 5 hours in total and still going strong. Opus 4.7 and ChatGPT 5.5 is trashed because i have tested them they build basic UI thats this thing build end to end full working just fucking crazy.

https://preview.redd.it/jhfgxum02gyg1.png?width=2514&format=png&auto=webp&s=c1f233b2ebe3c3794f06fcebcc8ba4ffa057f7eb

https://preview.redd.it/16qc8ym02gyg1.png?width=2485&format=png&auto=webp&s=2a5b32d66968a4767d43a0d293f9acbb979854c1

https://preview.redd.it/l9vkxum02gyg1.png?width=2481&format=png&auto=webp&s=631ab371a6c4570faa0190244bd56a13549899fe

https://preview.redd.it/1gj2vwm02gyg1.png?width=2487&format=png&auto=webp&s=238d0177984e22c4b7ad79174207d546e1000423

https://preview.redd.it/29ay5vm02gyg1.png?width=2463&format=png&auto=webp&s=34d18eb8456ccaa520cbb1a78571a8b0c61e8eb0

https://preview.redd.it/b2lphum02gyg1.png?width=2454&format=png&auto=webp&s=bb89dde14acb09b2f1e41ba8089eb132224bcf2b

https://preview.redd.it/hj227vm02gyg1.png?width=2476&format=png&auto=webp&s=e8f9e88832fc3c8ee8a11ff64e1e3cd60fc6c1b2

https://preview.redd.it/5jotqum02gyg1.png?width=2498&format=png&auto=webp&s=7cfbe6faaabbb4b74037b00d2940e8e951073f71

Open Reddit thread
r/opencodeCLI 43 upvotes 27 comments May 13, 2026
GLM 5.1 is underrated?

A lot of people I talk to end up badmouthing GLM 5.1. I use it quite a bit for planning and have always had good experiences with it.

For implementation, I use DS Flash (max) or Kimi 2.6. I've also read about people having issues when using tools, but I've never had any problems with my stack...

Have any of you had a bad experience with it?

Open Reddit thread
View more discussions →

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models