DeepSeek

Kimi K2.5

Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code. The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.

Jan 27, 2026 262,144 context 16,384 tokens output

Visual Understanding Advanced Coding Mathematical Reasoning Agent Swarm Execution Long Context Processing Dual Inference Modes

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Price Comparison ↓ Providers ↓ Benchmarks ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

DeepSeek

Model ID

The routed model identifier exposed by upstream providers.

moonshotai/kimi-k2.5

Input Context Window

The number of tokens supported by the input context window.

262,144 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

16,384 tokens tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Jan 27, 2026 5 months ago

Knowledge Cut-off Date

When the model's knowledge was last updated.

January 2026

API Providers

The providers that offer this model. This is not an exhaustive list.

DigitalOcean, Chutes, DeepInfra, SiliconFlow, AtlasCloud, StreamLake, Venice, Novita, Moonshot AI, Phala

Modalities

Types of data this model can process.

Text Image Video Code

What is Kimi K2.5

A fuller summary of positioning, capabilities, and source-specific details for Kimi K2.5.

Kimi K2.5 is an open-source multimodal model developed by Moonshot AI and released in January 2026. It uses a Mixture-of-Experts architecture with 1 trillion total parameters and approximately 32 billion active at inference time, trained on roughly 15 trillion mixed visual and text tokens. Unlike models that add vision as a secondary capability, Kimi K2.5 was trained natively on both image and text data, enabling integrated understanding of charts, documents, video, and code.

The model supports two operating modes — Instant Mode for direct responses and Thinking Mode for step-by-step reasoning on complex problems — within a 256,000-token context window. It introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents, reducing execution time by 4.5x on parallelizable tasks. Kimi K2.5 is released under a modified MIT license, making it available for local deployment, fine-tuning, and commercial use, and is particularly suited for visual programming, document analysis, automated research, and multi-step agentic workflows.

Capabilities

What Kimi K2.5 supports

Visual Understanding

Processes images, charts, documents, and video natively, achieving scores of 90.1 on MathVista, 92.3 on OCRBench, and 87.4 on VideoMME.

</>

Advanced Coding

Handles real-world software engineering tasks, scoring 76.8% on SWE-Bench Verified and 85.0% on LiveCodeBench v6.

Mathematical Reasoning

Applies step-by-step reasoning to math and science problems, scoring 96.1% on AIME 2025 and 87.6% on GPQA-Diamond.

Agent Swarm Execution

Coordinates up to 100 parallel sub-agents for complex workflows, achieving a 4.5x reduction in execution time on parallelizable tasks and 78.4% on BrowseComp.

CTX

Long Context Processing

Supports a 256,000-token context window, enabling analysis of long documents, extended codebases, and lengthy video content in a single pass.

Dual Inference Modes

Offers Instant Mode for fast, direct responses and Thinking Mode for deep, iterative reasoning on complex problems.

MoE Architecture

Uses a 1 trillion parameter Mixture-of-Experts design with ~32 billion parameters active per forward pass, balancing capacity with inference efficiency.

Pricing for Kimi K2.5

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $0.45 Per million tokens

Output tokens $1.90 Per million tokens

Price Comparison

Additional usage-cost dimensions synced into the project for this model.

Cache read $0.09

maxTemperature 1

maxResponseSize 16,384 tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

DigitalOcean Chutes DeepInfra SiliconFlow AtlasCloud StreamLake Venice Novita Moonshot AI Phala

Provider Endpoints

Endpoint-level provider data currently available for this model.

DigitalOcean

1d uptime: 98.5% Supported params: 11 Implicit caching: No

Chutes

Max output: 65,535 1d uptime: 91.5% Supported params: 15 Implicit caching: No

DeepInfra

Max output: 64,000 1d uptime: 99.8% Supported params: 16 Implicit caching: No

SiliconFlow

Max output: 262,144 1d uptime: 99.9% Supported params: 11 Implicit caching: No

AtlasCloud

Max output: 262,144 1d uptime: 99.8% Supported params: 16 Implicit caching: No

StreamLake

Max output: 256,000 1d uptime: 99.8% Supported params: 14 Implicit caching: No

Venice

Max output: 65,536 1d uptime: 95.9% Supported params: 15 Implicit caching: No

Novita

Max output: 262,144 1d uptime: 100.0% Supported params: 17 Implicit caching: No

Moonshot AI

1d uptime: 100.0% Supported params: 10 Implicit caching: No

Phala

Max output: 262,144 1d uptime: 92.6% Supported params: 16 Implicit caching: No

Model Performance

Benchmark scores synced from the current model source and normalized into the local catalog.

Benchmark	Score
AIME 2025 American math olympiad problems (2025)	96.1%
BrowseComp Complex web browsing and information retrieval	60.6%
GPQA Diamond PhD-level science questions (biology, physics, chemistry)	87.9%
HLE Questions that challenge frontier models across many domains	29.4%
LiveCodeBench Real-world coding tasks from recent competitions	85.0%
MMLU-Pro Expert knowledge across 14 academic disciplines	87.1%
OSWorld-Verified Autonomous computer use and desktop tasks	63.3%
SciCode Scientific research coding and numerical methods	49.0%
SWE-bench Pro Challenging real-world software engineering tasks	50.7%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes	76.8%
Terminal-Bench 2.0 Agentic coding and terminal command tasks	50.8%

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

GitHub Repository Open Source

→

Model Card (Hugging Face) Documentation

→

Official Technical Report Research

→

API Reference (OpenRouter) Playground

→

NVIDIA Build Platform Playground

→

OpenRouter Model Page OpenRouter

→

AI tools related to Kimi K2.5

These tools are strongly connected to Kimi K2.5 through direct product references, provider mentions, or explicit model mappings.

AI Assistant

DeepSeek

DeepSeek is an AI research company established in 2023 that specializes in developing advanced general artificial intelligence foundation models. The company has released and open-sourced several large-scale models, such as DeepSeek-LLM, DeepSeek-Coder, and DeepSeek-MoE. Additionally, DeepSeek offers API access to these models, enabling developers to integrate their AI capabilities into various applications.

Free 411 visits 44 saves

AI Chatbot

DeepSeek R1 Online

DeepSeek R1 Online provides direct access to the DeepSeek R1 AI model, an open-source solution built for advanced reasoning. The platform offers free, no-login access to the model, which is engineered for complex problem-solving, multilingual tasks, and production-grade code generation. By leveraging a Mixture of Experts (MoE) architecture and advanced reinforcement learning, the model delivers high performance across mathematics, coding, and general reasoning. The platform also hosts distilled versions of the model for various specialized use cases.

Free 36 visits 2 saves

AI Image Generator

SEO Writing AI

SEO Writing AI is an AI-powered writing platform designed to create SEO-optimized articles, blog posts, and affiliate content with a single click. It enables users to generate content in bulk and auto-publish directly to WordPress. By analyzing top-ranking search results and extracting relevant calls-to-action, the platform produces ready-to-publish pages. Key features include long-form content generation, product listing creation, SEO optimization tools, and specialized models for affiliate marketing content.

Free 120 visits 11 saves

AI Assistant

ChatGOT

ChatGOT is a platform that consolidates multiple AI chat assistants into a single interface. By integrating models such as DeepSeek, GPT-4, Claude 3.5, and Gemini 2.0, it supports tasks like writing, coding, and summarizing. Key features include chat functionality, PDF parsing, PowerPoint generation, image creation, and writing assistance.

Free 148 visits 6 saves

Related Daily Briefs

Recent daily stories tied to Kimi K2.5 through direct model mentions or provider-level coverage.

Frontier Models

Samsung Deploys ChatGPT Enterprise as Small Models Outperform Frontier LLMs and MiniMax M3 Challenges DeepSeek

MiniMax and OpenAI are raising the stakes for enterprise adoption.

2026-06-21 AI Models AI API

Community discussion

What people think about Kimi K2.5

Kimi K2.5 discussions are most active in r/LocalLLaMA, r/SillyTavernAI, r/opencodeCLI.

Top Reddit threads cluster around benchmark and model-comparison threads, safety and censorship questions, coding workflow discussions. The strongest match in this snapshot has 4668 upvotes and 361 comments.

r/SillyTavernAI 20 upvotes 24 comments February 28, 2026

Between Kimi K2.5, GLM 4.7, Deepseek V3.2, what should i pick?

These are all the models that i am interested in using, and they are all that i can afford at the moment. Would be great if you can also suggest other models as well!

I aim for a more emotional, less descriptive and flowery type of dialogues.

Open Reddit thread

r/SillyTavernAI 11 upvotes 3 comments January 27, 2026

Kimi k2.5 temperature?

Hey everyone, I've read all the threads about the Kimi K2.5, but I haven't found any temperature recommendations anywhere. What settings do you use?

Open Reddit thread

r/SillyTavernAI 14 upvotes 7 comments April 28, 2026

Hot take: Kimi 2.5> Kimi 2.6

For me kimi k2.6 compared to k2.5 more struggles with multiple characters bots and it's prose is much more idealized, it also often struggles to stay in character, and we can not forget the "wait" "actually" in it's reasoning making a response up to 60k tokens, while kimi k2.5 is much better where K2.6 struggles and costs twice less

Open Reddit thread

r/LocalLLaMA 889 upvotes 266 comments January 28, 2026

Kimi K2.5 is the best open model for coding

they really cooked

Open Reddit thread

r/singularity 841 upvotes 203 comments January 27, 2026

Kimi K2.5 Released!!!

New SOTA in Agentic Tasks!!!!

Blog: [https://www.kimi.com/blog/kimi-k2-5.html](https://www.kimi.com/blog/kimi-k2-5.html)

Open Reddit thread

View more discussions →

FAQ

Common questions about Kimi K2.5

What is the context window for Kimi K2.5?

Kimi K2.5 supports a context window of 262,144 tokens (256K), allowing it to process long documents, extended codebases, and lengthy video content in a single session.

Is Kimi K2.5 open-source and can it be used commercially?

Yes. Kimi K2.5 is released under a modified MIT license, which permits local deployment, fine-tuning, and integration into commercial applications.

What is the training data cutoff for Kimi K2.5?

Based on the available metadata, Kimi K2.5 was released in January 2026. A specific training data cutoff date is not stated in the provided metadata.

How does the Agent Swarm feature work?

Kimi K2.5 introduces an Agent Swarm paradigm that can coordinate up to 100 parallel sub-agents to execute complex, multi-step tasks. On parallelizable workloads, this reduces execution time by approximately 4.5x compared to sequential execution.

What are the two inference modes available in Kimi K2.5?

Kimi K2.5 supports Instant Mode, which provides fast and direct responses suited for everyday tasks, and Thinking Mode, which performs deep step-by-step reasoning for complex problems such as advanced math or multi-stage coding challenges.

How many parameters does Kimi K2.5 have, and how many are active at inference?

Kimi K2.5 has 1 trillion total parameters in a Mixture-of-Experts architecture, with approximately 32 billion parameters active at any given inference step.

More models from DeepSeek

Continue browsing adjacent models from the same provider.

← All AI Models