Kimi K2.6 is Moonshot AI's open-weight multimodal model, released on April 20, 2026. It is the third K2-class model in nine months (following K2 and K2.5). Built on a 1-trillion parameter Mixture-of-Experts architecture with 32 billion active parameters per token, K2.6 features native multimodal input (text, image, video), advanced agent swarm orchestration supporting up to 300 concurrent sub-agents, and ties GPT-5.5 on SWE-Bench Pro at 58.6%.
- Native Multimodal Architecture: Supports text, image, and video input through its custom MoonViT vision encoder — video input is new in K2.6, supporting mp4, mov, avi, and webm formats.
- Agent Swarm Orchestration: Supports up to 300 concurrent sub-agents per task and 4,000 coordinated steps, with 96.6% tool-invocation success rate (up from 91% on K2.5).
- Coding Performance: SWE-Bench Pro 58.6% (ties GPT-5.5, as of April 2026), SWE-bench Verified 80.2%, LiveCodeBench v6 89.6%, Terminal-Bench 2.0 66.7%.
- Modified MIT License: Open weights on Hugging Face; free for commercial use below 100M MAU or $20M monthly revenue.
- End-to-End Coding & UI Generation: Excels at transforming prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows across Python, Rust, and Go.
- Multi-Agent Systems: The 300-agent swarm capacity with 4,000-step coordination makes it ideal for complex autonomous workflows requiring long-context stability.
- Cost-Effective Multimodal Processing: At $0.95/$4.00 per million input/output tokens with automatic context caching ($0.16/M cached), it is significantly cheaper than proprietary multimodal alternatives.
| Capability | Description |
|---|
| Reasoning | AIME 2026: 96.4%, GPQA-Diamond: 90.5%, HLE with tools: 54.0% (leading) |
| Coding | SWE-Bench Pro 58.6%, SWE-bench Verified 80.2%, LiveCodeBench v6 89.6%, Terminal-Bench 2.0 66.7% |
| Multimodal | Text, image (png, jpeg, webp, gif), and video (mp4, mov, avi, webm) input via MoonViT vision encoder |
| Response Speed | Optimized for throughput in agentic workflows; specific t/s metrics vary by deployment |
| Context Window | 262K tokens |
| Max Output | 16K tokens (up to 98K in extended mode) |
| Tool Use | 96.6% tool-invocation success, supports 4,000+ tool calls per session, multi-agent handoffs |
| Multilingual | 160K vocabulary optimized for code and non-English text; SWE-bench Multilingual 76.7% |
- Multimodal benchmark performance is mediocre (ranked #26 of 115), lagging top proprietary models by 3-6 points on vision tasks (MMMU-Pro, MathVision).
- No URL-based image input via API; only base64-encoded content or file upload is supported.
- Image resolution capped at 4K, video at 2K; request body must stay under 100MB.
- Pure math reasoning trails GPT-5.4 (AIME 2026: 96.4 vs 99.2, GPQA-Diamond: 90.5 vs 92.8).
- Context window (262K) is smaller than some proprietary alternatives offering 1M+ tokens.
- Independent reviews note only marginal improvement over K2.5 on day-to-day tasks and struggles with domain-specific workloads.
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) |
|---|
| Kimi K2.6 | 0.95 | 0.95 | 0.16 | 4.00 |