Back to blog
Model Release·Technical Deep Dive

Gemini 3.5 Flash: Benchmarks, Pricing, and Complete Specs

Gemini 3.5 Flash is GA today. Frontier-level intelligence at 4x the speed of comparable models. $1.50/$9 per 1M tokens, 1M context, 76.2% Terminal-Bench 2.1, beats Gemini 3.1 Pro on coding and agents.

Jonathan Chavez
Jonathan Chavez
Co-Founder @ LLM Stats
·9 min read
Gemini 3.5 Flash: Benchmarks, Pricing, and Complete Specs

Google released Gemini 3.5 Flash today at Google I/O 2026. It is the first model in the new Gemini 3.5 family and the strongest agentic and coding model the Flash series has ever shipped. Generally available immediately via the Gemini API, Google AI Studio, Google Antigravity, the Gemini app, and AI Mode in Google Search. A 3.5 Pro version is in internal use and slated for next month.

The headline claim from Google's announcement is unusual for a Flash release: 3.5 Flash beats Gemini 3.1 Pro on most coding and agentic benchmarks, while running roughly 4x faster than comparable frontier models, and often at less than half the cost. The pitch is no longer “cheaper if you can tolerate worse”. It is frontier intelligence at Flash latency.

Key Numbers

Gemini 3.5 Flash · May 19, 2026

0.0%
Terminal-Bench 2.1
0.0%
MCP Atlas
0.0%
CharXiv Reasoning
0
GDPval-AA (Elo)
$0.00
Input / 1M tokens
0M
Context window

Frontier intelligence at 4x the speed of comparable frontier models, often at less than half the cost. Beats Gemini 3.1 Pro on the coding and agentic suite.

At a Glance

  • Release date: May 19, 2026 — Google I/O 2026. Generally available.
  • API model ID: gemini-3.5-flash (no preview suffix). Internal version: 3.5-flash-05-2026.
  • Pricing: $1.50 input / $9.00 output per 1M tokens. $0.15 cached input. Non-global regions: $1.65 / $9.90.
  • Context window: 1,048,576 input tokens / 65,536 output tokens.
  • Modalities: Text + image + audio + video input, text output.
  • Thinking: Dynamic thinking on by default. Tool use: function calling, structured output, search-as-a-tool, code execution.
  • Knowledge cutoff: January 2026.
  • Distribution: Gemini app, AI Mode in Search, Gemini API, Google AI Studio, Android Studio, Google Antigravity, Vertex AI, Gemini Enterprise Agent Platform.
  • Coming next: Gemini 3.5 Pro, in internal use, ships next month.
Gemini 3.5 Flash on the Artificial Analysis Intelligence Index — top-right quadrant of intelligence vs speed

Top-right quadrant of the Artificial Analysis Intelligence Index. Source: Google.


Why 3.5 Flash Matters

The cleanest way to read this release is: Google moved the frontier line down to the Flash tier. Until this morning, the working assumption across the industry was a two-tier mental model — “Pro” for hard problems, “Flash” for throughput. 3.5 Flash collapses that distinction on the workloads that matter for agents:

  • Long-horizon agentic tasks. Anything that used to take a developer days or an auditor weeks, 3.5 Flash can compress into hours, often at less than half the cost of other frontier models.
  • Multi-step tool use. Plans, builds, iterates. Reliably executes multi-turn function calling under supervision.
  • Sub-agent orchestration. Paired with the updated Antigravity harness, one 3.5 Flash run can spin up multiple subagents working in parallel.

The Trade

vs other frontier models

Same intelligence. A fraction of the time and price.

4x

output tokens / second

Versus comparable frontier models, by Google's own measurement. What used to take a developer days can finish in hours.

the cost

At $1.50 / $9 per million input / output tokens, 3.5 Flash undercuts every other model in its capability class.

The trade Google is asking you to make is interesting. 3.5 Flash trails Gemini 3.1 Pro on Humanity's Last Exam (40.2% vs 44.4%) and ARC-AGI-2 (72.1% vs 77.1%) — the benchmarks dominated by raw parametric knowledge and pure abstract reasoning. It beats 3.1 Pro on the benchmarks that look like real work: Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, GDPval-AA, OSWorld-Verified. If your workload is “an agent that needs to get something done” rather than “a researcher asking a hard question”, 3.5 Flash is the better choice today.


Benchmarks

All numbers below are self-reported by Google in the official evals methodology page. The comparison is against Gemini 3.1 Pro because that is the model Google itself frames 3.5 Flash against in the announcement.

Flash beats Pro

3.1 Pro  →  3.5 Flash

A Flash model that beats
its own Pro tier on coding and agents.

35
50
65
80
90
Δ
Terminal-Bench 2.1
+5.9
MCP Atlas
+5.4
CharXiv Reasoning
+0.9
OSWorld-Verified
+2.2
MMMU-Pro
+3.1
SWE-Bench Pro (Public)
+0.9
Finance Agent v2
+14.9
MRCR v2 · 128k
-7.6
Humanity's Last Exam
-4.2
ARC-AGI-2
-5.0
Gemini 3.1 Pro
Gemini 3.5 Flash
Self-reported scores. Source: deepmind.google/models/evals-methodology/gemini-3-5-flash. Lower scores on long-context and academic reasoning are expected: 3.5 Flash trades raw knowledge for agentic capability.

Coding

Benchmark3.5 Flash3.1 ProΔ
Terminal-Bench 2.176.2%70.3%+5.9
SWE-Bench Pro (Public)55.1%54.2%+0.9

Agentic & tool use

Benchmark3.5 Flash3.1 ProΔ
MCP Atlas83.6%78.2%+5.4
Toolathlon56.5%49.4%+7.1
OSWorld-Verified78.4%76.2%+2.2
Finance Agent v257.9%43.0%+14.9
GDPval-AA (Elo)16561314+342

The +14.9 point jump on Finance Agent v2 is the largest single delta in the suite and lines up with Google's real-world rollout: Macquarie Bank is piloting 3.5 Flash for customer onboarding over 100+ page financial documents, and Ramp is using it for OCR over messy invoices. The GDPval-AA gain of 342 Elo on economically valuable work is striking — and Google's framing is exactly that.

Multimodal & long context

Benchmark3.5 Flash3.1 ProΔ
CharXiv Reasoning84.2%83.3%+0.9
MMMU-Pro83.6%80.5%+3.1
Blueprint-Bench 233.6%26.5%+7.1
MRCR v2 · 128k77.3%84.9%-7.6
MRCR v2 · 1M26.6%26.3%+0.3

3.5 Flash leads on multimodal understanding across CharXiv, MMMU-Pro, and Blueprint-Bench 2 (agentic spatial reasoning). The long-context picture is mixed: at 128k it gives back 7.6 points to 3.1 Pro, but at the 1M extreme the two are within 0.3 points. The honest read: 3.1 Pro remains the better choice when the entire 128k window is dense with critical context. 3.5 Flash is the better choice everywhere else.

Reasoning

Benchmark3.5 Flash3.1 ProΔ
Humanity's Last Exam40.2%44.4%-4.2
ARC-AGI-272.1%77.1%-5.0

The two clean wins for 3.1 Pro. Both benchmarks reward dense parametric knowledge and abstract pattern recognition, where a larger Pro-tier model still has the edge. If your work is closer to research than to action, stay on 3.1 Pro for now.

Gemini 3.5 Flash evaluation card from Google DeepMind — full benchmark table vs Gemini 3 Flash, Gemini 3 Flash, Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5

Full evaluation card vs other frontier models. Source: Google DeepMind.


Pricing & Context

DetailValue
Input price (Global)$1.50 / 1M tokens
Output price (Global)$9.00 / 1M tokens
Cached input$0.15 / 1M tokens
Non-global regions$1.65 input / $9.90 output / $0.165 cached
Max input context1,048,576 tokens (1M)
Max output65,536 tokens (64K)
Model IDgemini-3.5-flash
Version3.5-flash-05-2026
Tier supportFree tier · Standard · Priority

Pricing is roughly 3x Gemini 3 Flash (which was $0.50 / $3), but still 40% cheaper on input and 40% cheaper on output than Gemini 3.1 Pro at $2.50 / $15. The 90% cache discount makes long agent contexts the dominant cost lever, not per-request input. For agent harnesses that re-use system prompts across many tool turns, the effective rate drops dramatically.


Developer Surface

Google released 3.5 Flash alongside an expanded agent-development stack. The full developer set, from the I/O 2026 developer post:

Antigravity 2.0

New standalone desktop application. Acts as a central home for agent interaction with parallel subagent execution, scheduled tasks for background automation, and integrations into AI Studio, Android, and Firebase. The updated Antigravity harness is co-optimized with 3.5 Flash specifically.

Antigravity CLI and SDK

A lightweight terminal product for fast agent creation, plus a programmatic SDK exposing the same harness Google's own products use. Gemini CLI users are being migrated to the new Antigravity CLI.

Managed Agents in the Gemini API

A single API call now spins up a full agent that reasons, uses tools, and executes code in an isolated Linux environment. Built on 3.5 Flash. Persistent environments resume across calls with files and state intact. Available via the Interactions API and in AI Studio at ai.dev/managed-agents.

Google AI Studio

Mobile app coming for pre-registration. Workspace API integration. One-click export to Antigravity. Native Android app building from a single prompt, with direct publishing to the Google Play Console test track.


Real-World Impact

Google's launch post lists six rollout partners with specific use cases. Notable for actually describing the workload, not just providing a logo:

  • Shopify — parallel subagents analyzing complex data for global merchant growth forecasts.
  • Macquarie Bank — reasoning over 100+ page financial documents to accelerate customer onboarding with low latency.
  • Salesforce Agentforce — multi-subagent enterprise task automation with context retention across multi-turn tool calls.
  • Ramp — multimodal OCR on complex invoices combined with reasoning over historical patterns.
  • Xero — autonomous multi-week workflows (1099 tax form preparation, supplier identification) for small-business admin.
  • Databricks — agentic monitoring and real-time retrieval across massive datasets, diagnosis, and proposed fixes for data teams.

The common thread is long-horizon work that crosses many tool calls where speed compounds. A 4x faster model running a 50-step plan is the difference between a workflow that finishes during a coffee break and one that takes most of an afternoon.


Personal AI Agents: Spark

3.5 Flash is the engine behind Gemini Spark, Google's new personal AI agent that runs 24/7 and takes action on a user's behalf under their direction. Spark rolls out to trusted testers today and to Google AI Ultra subscribers in the US in beta next week.

Gemini Spark — personal AI agent powered by Gemini 3.5 Flash, sample task interface

Gemini Spark uses 3.5 Flash to plan and execute tasks on the user's behalf.

3.5 Flash is also now the default model for the Gemini app and AI Mode in Google Search globally. From today, hundreds of millions of everyday Gemini queries are running on the new model.


Safety

Gemini 3.5 was developed under Google's Frontier Safety Framework. Google reports strengthened cyber and CBRN safeguards along with new interpretability tooling that inspects the model's internal reasoning before responses are returned. The framing: fewer harmful generations and fewer false refusals on safe queries.

Google cites interpretability research on examining model internals as part of the release. This is a non-trivial escalation for a Flash-tier release and worth watching as more details land in the formal model card.


Bottom Line

Gemini 3.5 Flash is the most important Flash release Google has shipped. The version bump (3 Flash → 3.5 Flash, in roughly five months) hides a much larger capability jump than the label suggests. It beats Gemini 3.1 Pro on coding and agents, runs 4x faster than other frontier models, and costs less than half of what Pro-class models charge.

The trade is real and worth naming: 3.5 Flash gives up ground on academic reasoning and dense long-context recall. If your workload is a hard question answered in one turn, stay with 3.1 Pro or wait for 3.5 Pro next month. If your workload is a multi-step agent that has to plan, call tools, and finish a real task, 3.5 Flash is the new default.

The full announcement and the partner-by-partner rollout are on blog.google. Developer documentation, including Managed Agents and the Antigravity harness, is on the I/O 2026 developer highlights post. The evaluation methodology page is at deepmind.google/models/evals-methodology/gemini-3-5-flash.

Questions

Frequently Asked Questions

  • Google released Gemini 3.5 Flash on May 19, 2026, at Google I/O 2026. It is generally available the same day across the Gemini API, Google AI Studio, Google Antigravity, Vertex AI, the Gemini app, and AI Mode in Google Search.
  • Gemini 3.5 Flash pricing is $1.50 per 1M input tokens and $9.00 per 1M output tokens on the standard tier. Cached input tokens are $0.15 per 1M. Non-global regions are priced at $1.65 / $9.90.
  • Gemini 3.5 Flash supports a 1 million token input context window and 64,000 output tokens. The official API reports an exact inputTokenLimit of 1,048,576.
  • Gemini 3.5 Flash beats Gemini 3.1 Pro on the coding and agentic suite: 76.2% vs 70.3% on Terminal-Bench 2.1, 83.6% vs 78.2% on MCP Atlas, 57.9% vs 43.0% on Finance Agent v2, and 1656 vs 1314 Elo on GDPval-AA. It trails Pro on long-context (MRCR v2) and Humanity's Last Exam, where raw knowledge weighs more than agentic capability.
  • Google reports Gemini 3.5 Flash runs 4 times faster than comparable frontier models measured in output tokens per second. It lands in the top-right quadrant of the Artificial Analysis Intelligence Index — frontier intelligence at Flash-tier latency.
  • The API model ID is gemini-3.5-flash (GA, no preview suffix). The internal version returned by the Gemini API is 3.5-flash-05-2026, signaling the May 2026 snapshot. Dynamic thinking is on by default.
  • Free options include the LLM Stats Coding Arena (blind head-to-head on real coding tasks), the unified Playground (same prompt across every model), the Google AI Studio free tier, and the Gemini app.

Continue Reading