Qwen3.6-27B

Overview

Qwen3.6-27B FP8 is a dense 27-billion-parameter multimodal model developed by Alibaba's Qwen Team, released on April 22, 2026, under the Apache 2.0 license. It is optimized for agentic coding and reasoning, outperforming the previous-generation 397B MoE flagship on all major coding benchmarks while being significantly more practical to deploy.

Key Features

Hybrid Gated DeltaNet Architecture: Uses a novel structure of 64 layers organized as 16 repeating blocks — each with 3 Gated DeltaNet (linear attention) sublayers followed by 1 Gated Attention (full self-attention) sublayer — balancing efficiency with strong long-context performance.
Natively Multimodal: Supports text, image, and video inputs in a single unified checkpoint, combining a causal language model with a vision encoder.
Hybrid Thinking Mode: Offers both a "thinking" mode (step-by-step reasoning with configurable budget of 512–8,192 tokens) and a "non-thinking" mode (instant responses) selectable at inference time.
Thinking Preservation: Retains chain-of-thought reasoning traces across multi-turn conversations, improving coherence in agentic coding workflows.
Multi-Token Prediction (MTP): Trained with multi-token prediction for improved throughput during inference.

Best Use Cases

Agentic Coding: Scores 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0, making it well-suited for autonomous code generation, debugging, and multi-step software engineering workflows.
Complex Reasoning Tasks: Achieves 87.8% on GPQA Diamond and 94.1% on AIME 2026, strong for scientific and mathematical problem-solving.

Capabilities and Limitations

Capability	Description
Reasoning	87.8% GPQA Diamond, 94.1% AIME 2026, 86.2% MMLU-Pro
Coding	77.2% SWE-bench Verified, 53.5% SWE-bench Pro, 59.3% Terminal-Bench 2.0, 48.2% SkillsBench
Creative Writing	General-purpose text generation; primarily optimized for code and reasoning rather than creative output
Multimodal	Text, image, and video input; text output only
Context Window	128k tokens
Max Output	32,768 tokens
Tool Use	Native function calling and tool use support

Known Limitations

Improvements on knowledge-recall benchmarks (MMLU-Pro, C-Eval) are modest compared to coding gains — better characterized as an agent/reasoning model than a knowledge-recall upgrade.
Independent third-party benchmark verification remains limited as of the initial release; published coding scores use Qwen's internal agent scaffold.

Pricing

Model	Input (Credits/Token)	Cache Write (Credits/Token)	Cache Read (Credits/Token)	Output (Credits/Token)
Qwen3.6-27B	0.19	0.19	0.019	2.99

Cache reads and writes are billed at the same rate.