Gemini 3.5 Flash

Overview

Gemini 3.5 Flash, released on May 19, 2026, is Google DeepMind's Flash-tier model in the Gemini 3 family. It is positioned as one of the most capable Flash model for sustained performance on agentic and coding tasks, combining frontier-level intelligence with low latency and cost efficiency. The model is available via the Google AI API (model ID: gemini-3.5-flash), Google Cloud Vertex AI, and Google AI Studio.

Key Features

Agentic Execution: Designed for sub-agent deployment, multi-step workflows, and rapid agentic loops at scale, with encrypted reasoning context preserved across API calls for stateful multi-turn interactions.
Coding Performance: Optimized for iterative coding cycles, rapid exploration, and prototyping, matching or exceeding Pro-class models from the previous generation at a fraction of the cost.
Configurable Thinking Levels: Four effort levels (minimal, low, medium, high) with medium as default, allowing fine-grained control over the quality-speed-cost tradeoff per request.
Combined Tool Use: Supports Google Search grounding, URL context, code execution, and function calling in a single request, enabling complex multi-tool agentic workflows without orchestration overhead.

Best Use Cases

Agentic Workflows at Scale: Excels at sub-agent deployment and long-horizon multi-step tasks, making it well-suited for production agent systems that require sustained intelligence across many sequential tool calls.
Rapid Coding Iterations: Strong coding capabilities at Flash-tier latency and pricing make it ideal for developer tools, code generation pipelines, and interactive prototyping where fast turnaround matters.
Cost-Efficient Intelligence: At $1.50/MTok input and $9.00/MTok output (standard tier), it delivers frontier-class performance at roughly 30% the cost of Gemini 3.1 Pro, suitable for high-volume production workloads.

Capabilities and Limitations

Capability	Description
Reasoning	Frontier-level reasoning with configurable thinking effort; no published benchmark scores at launch
Coding	Optimized for iterative coding cycles and agentic coding; matches prior-gen Pro-class coding quality
Multimodal	Text, image, video, audio, and PDF input; text-only output
Response Speed	Flash-tier latency; `minimal` thinking level available for chat-like speed
Context Window	1,048,576 tokens (~1M) at uniform pricing (no long-context surcharge)
Max Output	65,536 tokens
Tool Use	Function calling, code execution, Google Search grounding, URL context, structured outputs
Multilingual	Broad multilingual support across major languages

Known Limitations

No published quantitative benchmark scores at launch — performance claims are qualitative ("frontier-level").
Computer use (desktop/browser interaction) is not supported.
No image, audio, or video generation — output is text-only.
Knowledge cutoff is January 2025 despite the May 2026 release date.
Sampling parameters (temperature, top_p, top_k) are no longer recommended for Gemini 3.x models.

Pricing

Model	Input (Credits/Token)	Cache Write (Credits/Token)	Cache Read (Credits/Token)	Output (Credits/Token)
Gemini 3.5 Flash	1.50	1.50	0.15	9.00

Context caching storage: $1.00 per 1M tokens per hour.