Model Comparison

DeepSeek-V3.1 vs Gemini 2.0 Flash Thinking

Both models are evenly matched across the benchmarks.

Want to compare interactively?Try the playground

Performance Benchmarks

Comparative analysis across standard metrics

2 benchmarks

DeepSeek-V3.1 outperforms in 1 benchmarks (GPQA), while Gemini 2.0 Flash Thinking is better at 1 benchmark (AIME 2024).

Both models are evenly matched across the benchmarks.

Sun May 24 2026 • llm-stats.com

Arena Performance

Human preference votes

Context Window

Maximum input and output token capacity

Only DeepSeek-V3.1 specifies input context (163,840 tokens). Only DeepSeek-V3.1 specifies output context (163,840 tokens).

DeepSeek-V3.1

Input163,840 tokens

Output163,840 tokens

Gemini 2.0 Flash Thinking

Input- tokens

Output- tokens

Sun May 24 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Gemini 2.0 Flash Thinking supports multimodal inputs, whereas DeepSeek-V3.1 does not.

Gemini 2.0 Flash Thinking can handle both text and other forms of data like images, making it suitable for multimodal applications.

DeepSeek-V3.1

Text

Images

Audio

Video

Gemini 2.0 Flash Thinking

Text

Images

Audio

Video

License

Usage and distribution terms

DeepSeek-V3.1 is licensed under MIT, while Gemini 2.0 Flash Thinking uses a proprietary license.

License differences may affect how you can use these models in commercial or open-source projects.

DeepSeek-V3.1

MIT

Open weights

Gemini 2.0 Flash Thinking

Proprietary

Closed source

Release Timeline

When each model was launched

DeepSeek-V3.1 was released on 2025-01-10, while Gemini 2.0 Flash Thinking was released on 2025-01-21.

Gemini 2.0 Flash Thinking is 0 month newer than DeepSeek-V3.1.

DeepSeek-V3.1

Jan 10, 2025

1.4 years ago

Gemini 2.0 Flash Thinking

Jan 21, 2025

1.3 years ago

1w newer

Knowledge Cutoff

When training data ends

Gemini 2.0 Flash Thinking has a documented knowledge cutoff of 2024-08-01, while DeepSeek-V3.1's cutoff date is not specified.

We can confirm Gemini 2.0 Flash Thinking's training data extends to 2024-08-01, but cannot make a direct comparison without DeepSeek-V3.1's cutoff date.

DeepSeek-V3.1

—

Gemini 2.0 Flash Thinking

Aug 2024

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion→

Key Takeaways

DeepSeek-V3.1

View details

DeepSeek

Larger context window (163,840 tokens)

Has open weights

Higher GPQA score (74.9% vs 74.2%)

Gemini 2.0 Flash Thinking

View details

Google

Supports multimodal inputs

Higher AIME 2024 score (73.3% vs 66.3%)

Detailed Comparison

AI Model Comparison Table
Feature	DeepSeek-V3.1	Gemini 2.0 Flash Thinking

FAQ

Common questions about DeepSeek-V3.1 vs Gemini 2.0 Flash Thinking.

Which is better, DeepSeek-V3.1 or Gemini 2.0 Flash Thinking?

Both models are evenly matched across the benchmarks. DeepSeek-V3.1 is made by DeepSeek and Gemini 2.0 Flash Thinking is made by Google. The best choice depends on your use case — compare their benchmark scores, pricing, and capabilities above.

How does DeepSeek-V3.1 compare to Gemini 2.0 Flash Thinking in benchmarks?

DeepSeek-V3.1 scores SimpleQA: 93.4%, MMLU-Redux: 91.8%, MMLU-Pro: 83.7%, GPQA: 74.9%, CodeForces: 69.7%. Gemini 2.0 Flash Thinking scores MMMU: 75.4%, GPQA: 74.2%, AIME 2024: 73.3%.

What are the context window sizes for DeepSeek-V3.1 and Gemini 2.0 Flash Thinking?

DeepSeek-V3.1 supports 164K tokens and Gemini 2.0 Flash Thinking supports an unknown number of tokens. A larger context window lets you process longer documents, conversations, or codebases in a single request.

What are the main differences between DeepSeek-V3.1 and Gemini 2.0 Flash Thinking?

Key differences include multimodal support (no vs yes), licensing (MIT vs Proprietary). See the full comparison above for benchmark-by-benchmark results.

Who makes DeepSeek-V3.1 and Gemini 2.0 Flash Thinking?

DeepSeek-V3.1 is developed by DeepSeek and Gemini 2.0 Flash Thinking is developed by Google.