Z.ai

GLM 4.7

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions. What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Dec 22, 2025 131,072 context 16,384 tokens output

Agentic Coding Terminal Automation Mathematical Reasoning Multi-Step Planning Tool Use Long Context Processing

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Parameters ↓ Benchmarks ↓ Tools ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

Z.ai

Model ID

The routed model identifier exposed by upstream providers.

z-ai/glm-4.7

Input Context Window

The number of tokens supported by the input context window.

131,072 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Dec 22, 2025 6 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

DeepInfra, StreamLake, AtlasCloud, Novita, Venice, Z.AI, Google, Phala, Cerebras

Modalities

Types of data this model can process.

Text

What is GLM 4.7

A fuller summary of positioning, capabilities, and source-specific details for GLM 4.7.

GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions.

What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.

Capabilities

What GLM 4.7 supports

</>

Agentic Coding

Handles multi-step coding agent loops while retaining context across turns, scoring 73.8% on SWE-bench Verified and 66.7% on SWE-bench Multilingual.

Terminal Automation

Executes command sequencing, error recovery, and multi-step shell automation, achieving 41.0% on Terminal Bench.

Mathematical Reasoning

Solves advanced math and science problems, scoring 95.7% on AIME 2025 and 97.1% on HMMT Feb. 2025.

Multi-Step Planning

Sequences actions across complex tasks using structured reasoning, scoring 87.4% on τ²-Bench.

Tool Use

Calls external tools reliably within agent loops using Interleaved Thinking, which applies reasoning before each tool invocation.

CTX

Long Context Processing

Processes inputs up to 131,072 tokens, supporting extended documents, codebases, and multi-turn conversations.

Multilingual Support

Handles coding and reasoning tasks across multiple languages, as reflected by its dedicated SWE-bench Multilingual score of 66.7%.

Configurable Reasoning

Turn-level Thinking lets developers enable or disable deep reasoning per conversation turn, trading response depth for speed as needed.

Science Reasoning

Addresses graduate-level science questions, achieving 85.7% on GPQA-Diamond.

Pricing for GLM 4.7

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.40 Per million tokens

Output tokens $1.75 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.08

maxTemperature 1

maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DeepInfra StreamLake AtlasCloud Novita Venice Z.AI Google Phala Cerebras

Provider Endpoints

Endpoint-level provider data currently available for this model.

DeepInfra

Max output: 131,072 1d uptime: 99.8% Supported params: 17 Implicit caching: No

StreamLake

Max output: 128,000 1d uptime: 93.9% Supported params: 13 Implicit caching: No

AtlasCloud

Max output: 202,752 1d uptime: 98.3% Supported params: 17 Implicit caching: No

Novita

Max output: 131,072 1d uptime: 97.5% Supported params: 14 Implicit caching: No

Venice

Max output: 16,384 1d uptime: 99.0% Supported params: 13 Implicit caching: No

Z.AI

Max output: 131,072 1d uptime: 93.2% Supported params: 9 Implicit caching: No

Google

Max output: 128,000 1d uptime: 99.9% Supported params: 15 Implicit caching: No

Phala

Max output: 131,072 1d uptime: 96.1% Supported params: 18 Implicit caching: No

Cerebras

Max output: 40,960 1d uptime: 100.0% Supported params: 16 Implicit caching: No

Configuration & Parameters

The configurable options currently documented for this model.

Reasoning Effort

Toggle Group

Default: medium

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Reasoning Effort

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2025 American math olympiad problems (2025)	95.7%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	85.9%
HLE Questions that challenge frontier models across many domains	25.1%
LiveCodeBench Real-world coding tasks from recent competitions	89.4%
MMLU-Pro Expert knowledge across 14 academic disciplines	85.6%
SciCode Scientific research coding and numerical methods	45.1%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes	73.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Model Card (Hugging Face) Other

→

NVIDIA NIM Model Card Other

→

Official Announcement Blog Post Announcements

→

GitHub Repository Open Source

→

API Documentation Documentation

→

OpenRouter Model Page OpenRouter

→

AI tools related to GLM 4.7

These tools are strongly connected to GLM 4.7 through direct product references, provider mentions, or explicit model mappings.

AI Chatbot

智谱清言

智谱清言 is a Chinese-language conversational AI developed by Zhipu AI, powered by the GLM large language model. It features capabilities including AI-driven search, image generation, document reading, and automated video and presentation creation. Additionally, it provides tools for data analysis, coding assistance, and a library of intelligent agents, including support for building custom agents.

Free 4 visits 3 saves

AI Assistant

Shmooz AI

Shmooz AI is an accessible AI assistant available on both WhatsApp and the web. It provides features such as image generation, real-time Google search integration, article summarization, and file interaction, aiming to deliver high-quality AI model capabilities across these platforms.

Free 13 visits 2 saves

AI Chatbot

Polybuzz AI

Polybuzz AI is a platform designed for creating and interacting with AI-powered characters for role-playing, storytelling, and creative dialogue. The service hosts over 20 million characters across genres such as anime, fantasy, and horror. Users can build custom AI characters, participate in secure chats, and utilize creative tools including free image generation. The platform provides immersive roleplay scenarios and includes customizable content filters to maintain a safe user environment.

Free 0 visits 1 saves

AI Assistant

Snoooz AI

Snoooz AI is an automated Out-of-Office (OOO) assistant designed to streamline email management. It handles personalized OOO replies, creates backups for urgent communications, and manages email categorization and routing. The tool is designed to help professionals and businesses enhance prospect engagement, customer success, and employee experience.

Free 9 visits

Community discussion

What people think about GLM 4.7

GLM 4.7 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/unsloth. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.

The strongest match in this snapshot has 756 upvotes and 230 comments.

r/SillyTavernAI 52 upvotes 25 comments May 14, 2026

Glm 4.7 is gone

Open Reddit thread

r/SillyTavernAI 18 upvotes 18 comments May 9, 2026

Sooo nvidia nim's glm 4.7 is getting deprecated soon..

glm 4.7 is gonna be gone soon in nvidia nim, what other free models/providers are there that are good? cause im a broke student and cant afford to pay for models especially in my countries economy. I seriously DO NOT want to go back to gemini 2.5 flash

Open Reddit thread

r/SillyTavernAI 4 upvotes 14 comments May 2, 2026

People who use GLM 4.7 i need help

Ive been trying out GLM 4.7 nvidia nim for a bit, and honestly its pretty amazing, im running it with 4.0 fatman preset currently but one thing i cant really understand is why are the messages so short? I dont know if its something with my preset. its not bad thing really but ive been gemini pilled with long ass responses. so ive been wondering if anyone knows how to make GLM 4.7's messages longer? thanks

Open Reddit thread

r/SillyTavernAI 18 upvotes 24 comments February 28, 2026

Between Kimi K2.5, GLM 4.7, Deepseek V3.2, what should i pick?

These are all the models that i am interested in using, and they are all that i can afford at the moment. Would be great if you can also suggest other models as well!

I aim for a more emotional, less descriptive and flowery type of dialogues.

Open Reddit thread

r/SillyTavernAI 18 upvotes 24 comments January 7, 2026

glm 4.6 is still incredibly better than glm 4.7

this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6

even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me

maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success

Open Reddit thread

View more discussions →

FAQ

Common questions about GLM 4.7

What is the context window size for GLM-4.7?

GLM-4.7 supports a context window of 131,072 tokens, which allows it to process long documents, extended codebases, and lengthy multi-turn conversations in a single session.

What license does GLM-4.7 use?

GLM-4.7 is released under the MIT license, which permits both commercial and non-commercial use without royalty restrictions.

What is the knowledge cutoff for GLM-4.7?

Based on the available metadata, GLM-4.7 has a training date of December 2025. A specific knowledge cutoff date beyond this has not been published in the provided metadata.

How many parameters does GLM-4.7 have?

GLM-4.7 has 358 billion parameters, making it a large-scale model intended for demanding tasks such as agentic coding, complex reasoning, and terminal automation.

What are the three thinking mechanisms introduced in GLM-4.7?

GLM-4.7 introduces Interleaved Thinking (reasoning before every response and tool call), Preserved Thinking (retaining reasoning context across conversation turns), and Turn-level Thinking (allowing developers to toggle reasoning on or off per turn).

Who developed GLM-4.7 and where can I access it?

GLM-4.7 was developed by Z.ai, formerly known as Zhipu AI/THUDM. It is available on Hugging Face, via NVIDIA NIM, and through the Z.ai API. The model weights and related code are also accessible on GitHub.

More models from Z.ai

Continue browsing adjacent models from the same provider.

← All AI Models