Model Comparison

DeepSeek VL2 Small vs Grok-3

Grok-3 significantly outperforms across most benchmarks.

Performance Benchmarks

Comparative analysis across standard metrics

1 benchmarks

DeepSeek VL2 Small outperforms in 0 benchmarks, while Grok-3 is better at 1 benchmark (MMMU).

Grok-3 significantly outperforms across most benchmarks.

Thu May 14 2026 • llm-stats.com

Arena Performance

Human preference votes

Context Window

Maximum input and output token capacity

Only Grok-3 specifies input context (128,000 tokens). Only Grok-3 specifies output context (8,000 tokens).

DeepSeek
DeepSeek VL2 Small
Input- tokens
Output- tokens
xAI
Grok-3
Input128,000 tokens
Output8,000 tokens
Thu May 14 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Both DeepSeek VL2 Small and Grok-3 support multimodal inputs.

They are both capable of processing various types of data, offering versatility in application.

DeepSeek VL2 Small

Text
Images
Audio
Video

Grok-3

Text
Images
Audio
Video

License

Usage and distribution terms

DeepSeek VL2 Small is licensed under deepseek, while Grok-3 uses a proprietary license.

License differences may affect how you can use these models in commercial or open-source projects.

DeepSeek VL2 Small

deepseek

Open weights

Grok-3

Proprietary

Closed source

Release Timeline

When each model was launched

DeepSeek VL2 Small was released on 2024-12-13, while Grok-3 was released on 2025-02-17.

Grok-3 is 2 months newer than DeepSeek VL2 Small.

DeepSeek VL2 Small

Dec 13, 2024

1.4 years ago

Grok-3

Feb 17, 2025

1.2 years ago

2mo newer

Knowledge Cutoff

When training data ends

Grok-3 has a documented knowledge cutoff of 2024-11-17, while DeepSeek VL2 Small's cutoff date is not specified.

We can confirm Grok-3's training data extends to 2024-11-17, but cannot make a direct comparison without DeepSeek VL2 Small's cutoff date.

DeepSeek VL2 Small

Grok-3

Nov 2024

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion

Key Takeaways

Has open weights
Larger context window (128,000 tokens)
Higher MMMU score (78.0% vs 48.0%)
DeepSeekDeepSeek VL2 Small
xAIGrok-3

Detailed Comparison

AI Model Comparison Table
Feature
DeepSeek
DeepSeek VL2 Small
xAI
Grok-3

FAQ

Common questions about DeepSeek VL2 Small vs Grok-3.

Which is better, DeepSeek VL2 Small or Grok-3?

Grok-3 significantly outperforms across most benchmarks. DeepSeek VL2 Small is made by DeepSeek and Grok-3 is made by xAI. The best choice depends on your use case — compare their benchmark scores, pricing, and capabilities above.

How does DeepSeek VL2 Small compare to Grok-3 in benchmarks?

DeepSeek VL2 Small scores DocVQA: 92.3%, ChartQA: 84.5%, OCRBench: 83.4%, TextVQA: 83.4%, MMBench: 80.3%. Grok-3 scores AIME 2024: 93.3%, AIME 2025: 93.3%, GPQA: 84.6%, LiveCodeBench: 79.4%, MMMU: 78.0%.

What are the context window sizes for DeepSeek VL2 Small and Grok-3?

DeepSeek VL2 Small supports an unknown number of tokens and Grok-3 supports 128K tokens. A larger context window lets you process longer documents, conversations, or codebases in a single request.

What are the main differences between DeepSeek VL2 Small and Grok-3?

Key differences include licensing (deepseek vs Proprietary). See the full comparison above for benchmark-by-benchmark results.

Who makes DeepSeek VL2 Small and Grok-3?

DeepSeek VL2 Small is developed by DeepSeek and Grok-3 is developed by xAI.