Gemini 3.5 Flash: Benchmarks, Pricing, and Complete Specs
Gemini 3.5 Flash is GA today. Frontier-level intelligence at 4x the speed of comparable models. $1.50/$9 per 1M tokens, 1M context, 76.2% Terminal-Bench 2.1, beats Gemini 3.1 Pro on coding and agents.

Google released Gemini 3.5 Flash today at Google I/O 2026. It is the first model in the new Gemini 3.5 family and the strongest agentic and coding model the Flash series has ever shipped. Generally available immediately via the Gemini API, Google AI Studio, Google Antigravity, the Gemini app, and AI Mode in Google Search. A 3.5 Pro version is in internal use and slated for next month.
The headline claim from Google's announcement is unusual for a Flash release: 3.5 Flash beats Gemini 3.1 Pro on most coding and agentic benchmarks, while running roughly 4x faster than comparable frontier models, and often at less than half the cost. The pitch is no longer “cheaper if you can tolerate worse”. It is frontier intelligence at Flash latency.
Key Numbers
Gemini 3.5 Flash · May 19, 2026
Frontier intelligence at 4x the speed of comparable frontier models, often at less than half the cost. Beats Gemini 3.1 Pro on the coding and agentic suite.
At a Glance
- Release date: May 19, 2026 — Google I/O 2026. Generally available.
- API model ID:
gemini-3.5-flash(no preview suffix). Internal version:3.5-flash-05-2026. - Pricing: $1.50 input / $9.00 output per 1M tokens. $0.15 cached input. Non-global regions: $1.65 / $9.90.
- Context window: 1,048,576 input tokens / 65,536 output tokens.
- Modalities: Text + image + audio + video input, text output.
- Thinking: Dynamic thinking on by default. Tool use: function calling, structured output, search-as-a-tool, code execution.
- Knowledge cutoff: January 2026.
- Distribution: Gemini app, AI Mode in Search, Gemini API, Google AI Studio, Android Studio, Google Antigravity, Vertex AI, Gemini Enterprise Agent Platform.
- Coming next: Gemini 3.5 Pro, in internal use, ships next month.

Top-right quadrant of the Artificial Analysis Intelligence Index. Source: Google.
Why 3.5 Flash Matters
The cleanest way to read this release is: Google moved the frontier line down to the Flash tier. Until this morning, the working assumption across the industry was a two-tier mental model — “Pro” for hard problems, “Flash” for throughput. 3.5 Flash collapses that distinction on the workloads that matter for agents:
- Long-horizon agentic tasks. Anything that used to take a developer days or an auditor weeks, 3.5 Flash can compress into hours, often at less than half the cost of other frontier models.
- Multi-step tool use. Plans, builds, iterates. Reliably executes multi-turn function calling under supervision.
- Sub-agent orchestration. Paired with the updated Antigravity harness, one 3.5 Flash run can spin up multiple subagents working in parallel.
The Trade
vs other frontier models
Same intelligence. A fraction of the time and price.
output tokens / second
Versus comparable frontier models, by Google's own measurement. What used to take a developer days can finish in hours.
the cost
At $1.50 / $9 per million input / output tokens, 3.5 Flash undercuts every other model in its capability class.
The trade Google is asking you to make is interesting. 3.5 Flash trails Gemini 3.1 Pro on Humanity's Last Exam (40.2% vs 44.4%) and ARC-AGI-2 (72.1% vs 77.1%) — the benchmarks dominated by raw parametric knowledge and pure abstract reasoning. It beats 3.1 Pro on the benchmarks that look like real work: Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, GDPval-AA, OSWorld-Verified. If your workload is “an agent that needs to get something done” rather than “a researcher asking a hard question”, 3.5 Flash is the better choice today.
Benchmarks
All numbers below are self-reported by Google in the official evals methodology page. The comparison is against Gemini 3.1 Pro because that is the model Google itself frames 3.5 Flash against in the announcement.
Flash beats Pro
3.1 Pro → 3.5 Flash
A Flash model that beats
its own Pro tier on coding and agents.
Coding
| Benchmark | 3.5 Flash | 3.1 Pro | Δ |
|---|---|---|---|
| Terminal-Bench 2.1 | 76.2% | 70.3% | +5.9 |
| SWE-Bench Pro (Public) | 55.1% | 54.2% | +0.9 |
Agentic & tool use
| Benchmark | 3.5 Flash | 3.1 Pro | Δ |
|---|---|---|---|
| MCP Atlas | 83.6% | 78.2% | +5.4 |
| Toolathlon | 56.5% | 49.4% | +7.1 |
| OSWorld-Verified | 78.4% | 76.2% | +2.2 |
| Finance Agent v2 | 57.9% | 43.0% | +14.9 |
| GDPval-AA (Elo) | 1656 | 1314 | +342 |
The +14.9 point jump on Finance Agent v2 is the largest single delta in the suite and lines up with Google's real-world rollout: Macquarie Bank is piloting 3.5 Flash for customer onboarding over 100+ page financial documents, and Ramp is using it for OCR over messy invoices. The GDPval-AA gain of 342 Elo on economically valuable work is striking — and Google's framing is exactly that.
Multimodal & long context
| Benchmark | 3.5 Flash | 3.1 Pro | Δ |
|---|---|---|---|
| CharXiv Reasoning | 84.2% | 83.3% | +0.9 |
| MMMU-Pro | 83.6% | 80.5% | +3.1 |
| Blueprint-Bench 2 | 33.6% | 26.5% | +7.1 |
| MRCR v2 · 128k | 77.3% | 84.9% | -7.6 |
| MRCR v2 · 1M | 26.6% | 26.3% | +0.3 |
3.5 Flash leads on multimodal understanding across CharXiv, MMMU-Pro, and Blueprint-Bench 2 (agentic spatial reasoning). The long-context picture is mixed: at 128k it gives back 7.6 points to 3.1 Pro, but at the 1M extreme the two are within 0.3 points. The honest read: 3.1 Pro remains the better choice when the entire 128k window is dense with critical context. 3.5 Flash is the better choice everywhere else.
Reasoning
| Benchmark | 3.5 Flash | 3.1 Pro | Δ |
|---|---|---|---|
| Humanity's Last Exam | 40.2% | 44.4% | -4.2 |
| ARC-AGI-2 | 72.1% | 77.1% | -5.0 |
The two clean wins for 3.1 Pro. Both benchmarks reward dense parametric knowledge and abstract pattern recognition, where a larger Pro-tier model still has the edge. If your work is closer to research than to action, stay on 3.1 Pro for now.

Full evaluation card vs other frontier models. Source: Google DeepMind.
Pricing & Context
| Detail | Value |
|---|---|
| Input price (Global) | $1.50 / 1M tokens |
| Output price (Global) | $9.00 / 1M tokens |
| Cached input | $0.15 / 1M tokens |
| Non-global regions | $1.65 input / $9.90 output / $0.165 cached |
| Max input context | 1,048,576 tokens (1M) |
| Max output | 65,536 tokens (64K) |
| Model ID | gemini-3.5-flash |
| Version | 3.5-flash-05-2026 |
| Tier support | Free tier · Standard · Priority |
Pricing is roughly 3x Gemini 3 Flash (which was $0.50 / $3), but still 40% cheaper on input and 40% cheaper on output than Gemini 3.1 Pro at $2.50 / $15. The 90% cache discount makes long agent contexts the dominant cost lever, not per-request input. For agent harnesses that re-use system prompts across many tool turns, the effective rate drops dramatically.
Developer Surface
Google released 3.5 Flash alongside an expanded agent-development stack. The full developer set, from the I/O 2026 developer post:
Antigravity 2.0
New standalone desktop application. Acts as a central home for agent interaction with parallel subagent execution, scheduled tasks for background automation, and integrations into AI Studio, Android, and Firebase. The updated Antigravity harness is co-optimized with 3.5 Flash specifically.
Antigravity CLI and SDK
A lightweight terminal product for fast agent creation, plus a programmatic SDK exposing the same harness Google's own products use. Gemini CLI users are being migrated to the new Antigravity CLI.
Managed Agents in the Gemini API
A single API call now spins up a full agent that reasons, uses tools, and executes code in an isolated Linux environment. Built on 3.5 Flash. Persistent environments resume across calls with files and state intact. Available via the Interactions API and in AI Studio at ai.dev/managed-agents.
Google AI Studio
Mobile app coming for pre-registration. Workspace API integration. One-click export to Antigravity. Native Android app building from a single prompt, with direct publishing to the Google Play Console test track.
Real-World Impact
Google's launch post lists six rollout partners with specific use cases. Notable for actually describing the workload, not just providing a logo:
- Shopify — parallel subagents analyzing complex data for global merchant growth forecasts.
- Macquarie Bank — reasoning over 100+ page financial documents to accelerate customer onboarding with low latency.
- Salesforce Agentforce — multi-subagent enterprise task automation with context retention across multi-turn tool calls.
- Ramp — multimodal OCR on complex invoices combined with reasoning over historical patterns.
- Xero — autonomous multi-week workflows (1099 tax form preparation, supplier identification) for small-business admin.
- Databricks — agentic monitoring and real-time retrieval across massive datasets, diagnosis, and proposed fixes for data teams.
The common thread is long-horizon work that crosses many tool calls where speed compounds. A 4x faster model running a 50-step plan is the difference between a workflow that finishes during a coffee break and one that takes most of an afternoon.
Personal AI Agents: Spark
3.5 Flash is the engine behind Gemini Spark, Google's new personal AI agent that runs 24/7 and takes action on a user's behalf under their direction. Spark rolls out to trusted testers today and to Google AI Ultra subscribers in the US in beta next week.

Gemini Spark uses 3.5 Flash to plan and execute tasks on the user's behalf.
3.5 Flash is also now the default model for the Gemini app and AI Mode in Google Search globally. From today, hundreds of millions of everyday Gemini queries are running on the new model.
Safety
Gemini 3.5 was developed under Google's Frontier Safety Framework. Google reports strengthened cyber and CBRN safeguards along with new interpretability tooling that inspects the model's internal reasoning before responses are returned. The framing: fewer harmful generations and fewer false refusals on safe queries.
Google cites interpretability research on examining model internals as part of the release. This is a non-trivial escalation for a Flash-tier release and worth watching as more details land in the formal model card.
Bottom Line
Gemini 3.5 Flash is the most important Flash release Google has shipped. The version bump (3 Flash → 3.5 Flash, in roughly five months) hides a much larger capability jump than the label suggests. It beats Gemini 3.1 Pro on coding and agents, runs 4x faster than other frontier models, and costs less than half of what Pro-class models charge.
The trade is real and worth naming: 3.5 Flash gives up ground on academic reasoning and dense long-context recall. If your workload is a hard question answered in one turn, stay with 3.1 Pro or wait for 3.5 Pro next month. If your workload is a multi-step agent that has to plan, call tools, and finish a real task, 3.5 Flash is the new default.
The full announcement and the partner-by-partner rollout are on blog.google. Developer documentation, including Managed Agents and the Antigravity harness, is on the I/O 2026 developer highlights post. The evaluation methodology page is at deepmind.google/models/evals-methodology/gemini-3-5-flash.
Questions
Frequently Asked Questions
- Google released Gemini 3.5 Flash on May 19, 2026, at Google I/O 2026. It is generally available the same day across the Gemini API, Google AI Studio, Google Antigravity, Vertex AI, the Gemini app, and AI Mode in Google Search.
- Gemini 3.5 Flash pricing is $1.50 per 1M input tokens and $9.00 per 1M output tokens on the standard tier. Cached input tokens are $0.15 per 1M. Non-global regions are priced at $1.65 / $9.90.
- Gemini 3.5 Flash supports a 1 million token input context window and 64,000 output tokens. The official API reports an exact
inputTokenLimitof 1,048,576. - Gemini 3.5 Flash beats Gemini 3.1 Pro on the coding and agentic suite: 76.2% vs 70.3% on Terminal-Bench 2.1, 83.6% vs 78.2% on MCP Atlas, 57.9% vs 43.0% on Finance Agent v2, and 1656 vs 1314 Elo on GDPval-AA. It trails Pro on long-context (MRCR v2) and Humanity's Last Exam, where raw knowledge weighs more than agentic capability.
- Google reports Gemini 3.5 Flash runs 4 times faster than comparable frontier models measured in output tokens per second. It lands in the top-right quadrant of the Artificial Analysis Intelligence Index — frontier intelligence at Flash-tier latency.
- The API model ID is
gemini-3.5-flash(GA, no preview suffix). The internal version returned by the Gemini API is3.5-flash-05-2026, signaling the May 2026 snapshot. Dynamic thinking is on by default. - Free options include the LLM Stats Coding Arena (blind head-to-head on real coding tasks), the unified Playground (same prompt across every model), the Google AI Studio free tier, and the Gemini app.
Continue Reading
