Gemini 3.5 Flash

Overview

Gemini 3.5 Flash, released on May 19, 2026, is Google DeepMind's Flash-tier model in the Gemini 3 family. It is positioned as one of the most capable Flash model for sustained performance on agentic and coding tasks, combining frontier-level intelligence with low latency and cost efficiency. The model is available via the Google AI API (model ID: gemini-3.5-flash), Google Cloud Vertex AI, and Google AI Studio.

Key Features

  • Agentic Execution: Designed for sub-agent deployment, multi-step workflows, and rapid agentic loops at scale, with encrypted reasoning context preserved across API calls for stateful multi-turn interactions.
  • Coding Performance: Optimized for iterative coding cycles, rapid exploration, and prototyping, matching or exceeding Pro-class models from the previous generation at a fraction of the cost.
  • Configurable Thinking Levels: Four effort levels (minimal, low, medium, high) with medium as default, allowing fine-grained control over the quality-speed-cost tradeoff per request.
  • Combined Tool Use: Supports Google Search grounding, URL context, code execution, and function calling in a single request, enabling complex multi-tool agentic workflows without orchestration overhead.

Best Use Cases

  • Agentic Workflows at Scale: Excels at sub-agent deployment and long-horizon multi-step tasks, making it well-suited for production agent systems that require sustained intelligence across many sequential tool calls.
  • Rapid Coding Iterations: Strong coding capabilities at Flash-tier latency and pricing make it ideal for developer tools, code generation pipelines, and interactive prototyping where fast turnaround matters.
  • Cost-Efficient Intelligence: At $1.50/MTok input and $9.00/MTok output (standard tier), it delivers frontier-class performance at roughly 30% the cost of Gemini 3.1 Pro, suitable for high-volume production workloads.

Capabilities and Limitations

CapabilityDescription
ReasoningFrontier-level reasoning with configurable thinking effort; no published benchmark scores at launch
CodingOptimized for iterative coding cycles and agentic coding; matches prior-gen Pro-class coding quality
MultimodalText, image, video, audio, and PDF input; text-only output
Response SpeedFlash-tier latency; minimal thinking level available for chat-like speed
Context Window1,048,576 tokens (~1M) at uniform pricing (no long-context surcharge)
Max Output65,536 tokens
Tool UseFunction calling, code execution, Google Search grounding, URL context, structured outputs
MultilingualBroad multilingual support across major languages

Known Limitations

  • No published quantitative benchmark scores at launch — performance claims are qualitative ("frontier-level").
  • Computer use (desktop/browser interaction) is not supported.
  • No image, audio, or video generation — output is text-only.
  • Knowledge cutoff is January 2025 despite the May 2026 release date.
  • Sampling parameters (temperature, top_p, top_k) are no longer recommended for Gemini 3.x models.

Pricing

ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)
Gemini 3.5 Flash1.501.500.159.00
  • Context caching storage: $1.00 per 1M tokens per hour.