Agentic Coding
Handles multi-step coding agent loops while retaining context across turns, scoring 73.8% on SWE-bench Verified and 66.7% on SWE-bench Multilingual.
GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions. What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for GLM 4.7.
GLM-4.7 is a 358-billion-parameter large language model developed by Z.ai (formerly Zhipu AI/THUDM) and released in December 2025. It is designed specifically for agentic workflows, multi-step coding tasks, terminal automation, and complex mathematical and scientific reasoning. The model is available under an MIT license, making it usable for both commercial and non-commercial applications. It supports a 131,072-token context window, allowing it to handle long documents and extended coding sessions.
What distinguishes GLM-4.7 from earlier GLM releases is a set of three reasoning mechanisms: Interleaved Thinking, which applies reasoning before every response and tool call; Preserved Thinking, which retains reasoning context across conversation turns to maintain consistency; and Turn-level Thinking, which lets developers toggle reasoning depth on or off per turn. On benchmarks, the model scores 73.8% on SWE-bench Verified, 95.7% on AIME 2025, and 87.4% on τ²-Bench. It is best suited for developers and researchers building agent pipelines, automated coding tools, or applications requiring reliable multi-step planning.
Handles multi-step coding agent loops while retaining context across turns, scoring 73.8% on SWE-bench Verified and 66.7% on SWE-bench Multilingual.
Executes command sequencing, error recovery, and multi-step shell automation, achieving 41.0% on Terminal Bench.
Solves advanced math and science problems, scoring 95.7% on AIME 2025 and 97.1% on HMMT Feb. 2025.
Sequences actions across complex tasks using structured reasoning, scoring 87.4% on τ²-Bench.
Calls external tools reliably within agent loops using Interleaved Thinking, which applies reasoning before each tool invocation.
Processes inputs up to 131,072 tokens, supporting extended documents, codebases, and multi-turn conversations.
Handles coding and reasoning tasks across multiple languages, as reflected by its dedicated SWE-bench Multilingual score of 66.7%.
Turn-level Thinking lets developers enable or disable deep reasoning per conversation turn, trading response depth for speed as needed.
Addresses graduate-level science questions, achieving 85.7% on GPQA-Diamond.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
The configurable options currently documented for this model.
Parameters currently listed by OpenRouter or the local catalog for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2025
American math olympiad problems (2025)
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
SciCode
Scientific research coding and numerical methods
|
|
|
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
|
Official model cards, release notes, docs, and other references synced from the source page.
GLM 4.7 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/unsloth. Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions.
The strongest match in this snapshot has 756 upvotes and 230 comments.
glm 4.7 is gonna be gone soon in nvidia nim, what other free models/providers are there that are good? cause im a broke student and cant afford to pay for models especially in my countries economy. I seriously DO NOT want to go back to gemini 2.5 flash
Ive been trying out GLM 4.7 nvidia nim for a bit, and honestly its pretty amazing, im running it with 4.0 fatman preset currently but one thing i cant really understand is why are the messages so short? I dont know if its something with my preset. its not bad thing really but ive been gemini pilled with long ass responses. so ive been wondering if anyone knows how to make GLM 4.7's messages longer? thanks
These are all the models that i am interested in using, and they are all that i can afford at the moment. Would be great if you can also suggest other models as well!
I aim for a more emotional, less descriptive and flowery type of dialogues.
this is just my personal opinion and experience. but ever since the release of 4.7, i've been having a lot of issues. especially with those annoying ai patterns and very unnatural dialogue, things i simply didn't have when using 4.6
even though my prompt includes instructions for natural, human-like writing and dialogue, glm 4.7 feels like a chaotic machine that can't interpret things properly. i changed the prompt and the parameters multiple times, and it still felt very strange. there were small improvements here and there, but overall it didn't feel immersive at all and just ended up frustrating me
maybe it needs more specific instructions, different settings… idk. all i know is that i'm tired of spending days trying to make 4.7 work for my roleplay style without any success
GLM-4.7 supports a context window of 131,072 tokens, which allows it to process long documents, extended codebases, and lengthy multi-turn conversations in a single session.
GLM-4.7 is released under the MIT license, which permits both commercial and non-commercial use without royalty restrictions.
Based on the available metadata, GLM-4.7 has a training date of December 2025. A specific knowledge cutoff date beyond this has not been published in the provided metadata.
GLM-4.7 has 358 billion parameters, making it a large-scale model intended for demanding tasks such as agentic coding, complex reasoning, and terminal automation.
GLM-4.7 introduces Interleaved Thinking (reasoning before every response and tool call), Preserved Thinking (retaining reasoning context across conversation turns), and Turn-level Thinking (allowing developers to toggle reasoning on or off per turn).
GLM-4.7 was developed by Z.ai, formerly known as Zhipu AI/THUDM. It is available on Hugging Face, via NVIDIA NIM, and through the Z.ai API. The model weights and related code are also accessible on GitHub.
Continue browsing adjacent models from the same provider.