Qwen3.6-27B FP8 is a dense 27-billion-parameter multimodal model developed by Alibaba's Qwen Team, released on April 22, 2026, under the Apache 2.0 license. It is optimized for agentic coding and reasoning, outperforming the previous-generation 397B MoE flagship on all major coding benchmarks while being significantly more practical to deploy.
- Hybrid Gated DeltaNet Architecture: Uses a novel structure of 64 layers organized as 16 repeating blocks — each with 3 Gated DeltaNet (linear attention) sublayers followed by 1 Gated Attention (full self-attention) sublayer — balancing efficiency with strong long-context performance.
- Natively Multimodal: Supports text, image, and video inputs in a single unified checkpoint, combining a causal language model with a vision encoder.
- Hybrid Thinking Mode: Offers both a "thinking" mode (step-by-step reasoning with configurable budget of 512–8,192 tokens) and a "non-thinking" mode (instant responses) selectable at inference time.
- Thinking Preservation: Retains chain-of-thought reasoning traces across multi-turn conversations, improving coherence in agentic coding workflows.
- Multi-Token Prediction (MTP): Trained with multi-token prediction for improved throughput during inference.
- Agentic Coding: Scores 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0, making it well-suited for autonomous code generation, debugging, and multi-step software engineering workflows.
- Complex Reasoning Tasks: Achieves 87.8% on GPQA Diamond and 94.1% on AIME 2026, strong for scientific and mathematical problem-solving.
| Capability | Description |
|---|
| Reasoning | 87.8% GPQA Diamond, 94.1% AIME 2026, 86.2% MMLU-Pro |
| Coding | 77.2% SWE-bench Verified, 53.5% SWE-bench Pro, 59.3% Terminal-Bench 2.0, 48.2% SkillsBench |
| Creative Writing | General-purpose text generation; primarily optimized for code and reasoning rather than creative output |
| Multimodal | Text, image, and video input; text output only |
| Context Window | 128k tokens |
| Max Output | 32,768 tokens |
| Tool Use | Native function calling and tool use support |
- Improvements on knowledge-recall benchmarks (MMLU-Pro, C-Eval) are modest compared to coding gains — better characterized as an agent/reasoning model than a knowledge-recall upgrade.
- Independent third-party benchmark verification remains limited as of the initial release; published coding scores use Qwen's internal agent scaffold.
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) |
|---|
| Qwen3.6-27B | 0.19 | 0.19 | 0.019 | 2.99 |
- Cache reads and writes are billed at the same rate.