Z.ai

GLM 4.7

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions. What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Dec 22, 2025 131,072 context 16,384 tokens output
Agentic Coding Terminal Automation Mathematical Reasoning Multi-Step Planning Tool Use Long Context Processing

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-4.7

Input Context Window

The number of tokens supported by the input context window.

131,072 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

Dec 22, 2025 4 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, Parasail, SiliconFlow, AtlasCloud, Novita, Venice, Z.AI, Google, Phala, Cerebras

Modalities

Types of data this model can process.

Text

What is GLM 4.7

A fuller summary of positioning, capabilities, and source-specific details for GLM 4.7.

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions.

What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Capabilities

What GLM 4.7 supports

</>

Agentic Coding

Handles multi-step coding agent loops while retaining context across turns, scoring 73.8% on SWE-bench Verified and 66.7% on SWE-bench Multilingual.

AI

Terminal Automation

Executes command sequencing, error recovery, and multi-step shell automation, achieving 41.0% on Terminal Bench.

RN

Mathematical Reasoning

Solves advanced math and science problems, scoring 95.7% on AIME 2025 and 97.1% on HMMT Feb. 2025.

AI

Multi-Step Planning

Sequences actions across complex tasks using structured reasoning, scoring 87.4% on τ²-Bench.

TL

Tool Use

Calls external tools reliably within agent loops using Interleaved Thinking, which applies reasoning before each tool invocation.

CTX

Long Context Processing

Processes inputs up to 131,072 tokens, supporting extended documents, codebases, and multi-turn conversations.

AI

Multilingual Support

Handles coding and reasoning tasks across multiple languages, as reflected by its dedicated SWE-bench Multilingual score of 66.7%.

RN

Configurable Reasoning

Turn-level Thinking lets developers enable or disable deep reasoning per conversation turn, trading response depth for speed as needed.

RN

Science Reasoning

Addresses graduate-level science questions, achieving 85.7% on GPQA-Diamond.

Pricing for GLM 4.7

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.08
maxTemperature 1
maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra Parasail SiliconFlow AtlasCloud Novita Venice Z.AI Google Phala Cerebras

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 131,072 1d uptime: 98.1% Supported params: 17 Implicit caching: No

Parasail

Max output: 202,752 1d uptime: 99.5% Supported params: 16 Implicit caching: No

SiliconFlow

Max output: 204,800 1d uptime: 97.7% Supported params: 9 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 99.9% Supported params: 17 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 96.8% Supported params: 14 Implicit caching: No

Venice

Max output: 16,384 1d uptime: 99.1% Supported params: 13 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 88.0% Supported params: 8 Implicit caching: No

Google

Max output: 128,000 1d uptime: 100.0% Supported params: 15 Implicit caching: No

Phala

Max output: 131,072 1d uptime: 83.1% Supported params: 16 Implicit caching: No

Cerebras

Max output: 40,960 1d uptime: 99.8% Supported params: 16 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group
Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark Score
AIME 2025
American math olympiad problems (2025)
95.7%
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
85.9%
HLE
Questions that challenge frontier models across many domains
25.1%
LiveCodeBench
Real-world coding tasks from recent competitions
89.4%
MMLU-Pro
Expert knowledge across 14 academic disciplines
85.6%
SciCode
Scientific research coding and numerical methods
45.1%
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
73.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about GLM 4.7

GLM 4.7 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/unsloth. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 756 upvotes and 230 comments.

r/SillyTavernAI 18 upvotes 18 comments May 9, 2026
Sooo nvidia nim's glm 4.7 is getting deprecated soon..

glm 4.7 is gonna be gone soon in nvidia nim, what other free models/providers are there that are good? cause im a broke student and cant afford to pay for models especially in my countries economy. I seriously DO NOT want to go back to gemini 2.5 flash

Open Reddit thread
r/SillyTavernAI 4 upvotes 14 comments May 2, 2026
People who use GLM 4.7 i need help

Ive been trying out GLM 4.7 nvidia nim for a bit, and honestly its pretty amazing, im running it with 4.0 fatman preset currently but one thing i cant really understand is why are the messages so short? I dont know if its something with my preset. its not bad thing really but ive been gemini pilled with long ass responses. so ive been wondering if anyone knows how to make GLM 4.7's messages longer? thanks

Open Reddit thread
r/SillyTavernAI 18 upvotes 24 comments January 7, 2026
glm 4.6 is still incredibly better than glm 4.7

this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6

even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me

maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success

Open Reddit thread
View more discussions →
FAQ

Common questions about GLM 4.7

What is the context window size for GLM-4.7?

GLM-4.7 supports a context window of 131,072 tokens, which allows it to process long documents, extended codebases, and lengthy multi-turn conversations in a single session.

What license does GLM-4.7 use?

GLM-4.7 is released under the MIT license, which permits both commercial and non-commercial use without royalty restrictions.

What is the knowledge cutoff for GLM-4.7?

Based on the available metadata, GLM-4.7 has a training date of December 2025. A specific knowledge cutoff date beyond this has not been published in the provided metadata.

How many parameters does GLM-4.7 have?

GLM-4.7 has 358 billion parameters, making it a large-scale model intended for demanding tasks such as agentic coding, complex reasoning, and terminal automation.

What are the three thinking mechanisms introduced in GLM-4.7?

GLM-4.7 introduces Interleaved Thinking (reasoning before every response and tool call), Preserved Thinking (retaining reasoning context across conversation turns), and Turn-level Thinking (allowing developers to toggle reasoning on or off per turn).

Who developed GLM-4.7 and where can I access it?

GLM-4.7 was developed by Z.ai, formerly known as Zhipu AI/THUDM. It is available on Hugging Face, via NVIDIA NIM, and through the Z.ai API. The model weights and related code are also accessible on GitHub.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models