Visual Understanding
Processes images, charts, documents, and video natively, achieving scores of 90.1 on MathVista, 92.3 on OCRBench, and 87.4 on VideoMME.
Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code. The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The routed model identifier exposed by upstream providers.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Kimi K2.5.
Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code.
The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.
Processes images, charts, documents, and video natively, achieving scores of 90.1 on MathVista, 92.3 on OCRBench, and 87.4 on VideoMME.
Handles real-world software engineering tasks, scoring 76.8% on SWE-Bench Verified and 85.0% on LiveCodeBench v6.
Applies step-by-step reasoning to math and science problems, scoring 96.1% on AIME 2025 and 87.6% on GPQA-Diamond.
Coordinates up to 100 parallel sub-agents for complex workflows, achieving a 4.5x reduction in execution time on parallelizable tasks and 78.4% on BrowseComp.
Supports a 256,000-token context window, enabling analysis of long documents, extended codebases, and lengthy video content in a single pass.
Offers Instant Mode for fast, direct responses and Thinking Mode for deep, iterative reasoning on complex problems.
Uses a 1 trillion parameter Mixture-of-Experts design with ~32 billion parameters active per forward pass, balancing capacity with inference efficiency.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Additional usage-cost dimensions synced into the project for this model.
Places where this model is available, based on the synced detail-page metadata.
Endpoint-level provider data currently available for this model.
Benchmark scores synced from the current model source and normalized into the local catalog.
| Benchmark | Score |
|---|---|
|
AIME 2025
American math olympiad problems (2025)
|
|
|
BrowseComp
Complex web browsing and information retrieval
|
|
|
GPQA Diamond
PhD-level science questions (biology, physics, chemistry)
|
|
|
HLE
Questions that challenge frontier models across many domains
|
|
|
LiveCodeBench
Real-world coding tasks from recent competitions
|
|
|
MMLU-Pro
Expert knowledge across 14 academic disciplines
|
|
|
OSWorld-Verified
Autonomous computer use and desktop tasks
|
|
|
SciCode
Scientific research coding and numerical methods
|
|
|
SWE-bench Pro
Challenging real-world software engineering tasks
|
|
|
SWE-bench Verified
Real GitHub issues requiring multi-file code fixes
|
|
|
Terminal-Bench 2.0
Agentic coding and terminal command tasks
|
Official model cards, release notes, docs, and other references synced from the source page.
Kimi K2.5 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/opencodeCLI.
Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 4668 upvotes and 361 comments.
These are all the models that i am interested in using, and they are all that i can afford at the moment. Would be great if you can also suggest other models as well!
I aim for a more emotional, less descriptive and flowery type of dialogues.
Hey everyone, I've read all the threads about the Kimi K2.5, but I haven't found any temperature recommendations anywhere. What settings do you use?
For me kimi k2.6 compared to k2.5 more struggles with multiple characters bots and it's prose is much more idealized, it also often struggles to stay in character, and we can not forget the "wait" "actually" in it's reasoning making a response up to 60k tokens, while kimi k2.5 is much better where K2.6 struggles and costs twice less
they really cooked
New SOTA in Agentic Tasks!!!!
Blog: [https://www.kimi.com/blog/kimi-k2-5.html](https://www.kimi.com/blog/kimi-k2-5.html)
Kimi K2.5 supports a context window of 262,144 tokens (256K), allowing it to process long documents, extended codebases, and lengthy video content in a single session.
Yes. Kimi K2.5 is released under a modified MIT license, which permits local deployment, fine-tuning, and integration into commercial applications.
Based on the available metadata, Kimi K2.5 was released in January 2026. A specific training data cutoff date is not stated in the provided metadata.
Kimi K2.5 introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents to execute complex, multi-step tasks. On parallelizable workloads, this reduces execution time by approximately 4.5x compared to sequential execution.
Kimi K2.5 supports Instant Mode, which provides fast and direct responses suited for everyday tasks, and Thinking Mode, which performs deep step-by-step reasoning for complex problems such as advanced math or multi-stage coding challenges.
Kimi K2.5 has 1 trillion total parameters in a Mixture-of-Experts architecture, with approximately 32 billion parameters active at any given inference step.
Continue browsing adjacent models from the same provider.