Model Comparison

Mistral Large 3 (675B Instruct 2512) vs Phi 4 Reasoning

Phi 4 Reasoning significantly outperforms across most benchmarks.

Performance Benchmarks

Comparative analysis across standard metrics

2 benchmarks

Mistral Large 3 (675B Instruct 2512) outperforms in 0 benchmarks, while Phi 4 Reasoning is better at 2 benchmarks (GPQA, LiveCodeBench).

Phi 4 Reasoning significantly outperforms across most benchmarks.

Sun May 17 2026 • llm-stats.com

Arena Performance

Human preference votes

Model Size

Parameter count comparison

661.0B diff

Mistral Large 3 (675B Instruct 2512) has 661.0B more parameters than Phi 4 Reasoning, making it 4721.4% larger.

Mistral AI
Mistral Large 3 (675B Instruct 2512)
675.0Bparameters
Microsoft
Phi 4 Reasoning
14.0Bparameters
675.0B
Mistral Large 3 (675B Instruct 2512)
14.0B
Phi 4 Reasoning

Context Window

Maximum input and output token capacity

Only Mistral Large 3 (675B Instruct 2512) specifies input context (262,100 tokens). Only Mistral Large 3 (675B Instruct 2512) specifies output context (262,100 tokens).

Mistral AI
Mistral Large 3 (675B Instruct 2512)
Input262,100 tokens
Output262,100 tokens
Microsoft
Phi 4 Reasoning
Input- tokens
Output- tokens
Sun May 17 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Mistral Large 3 (675B Instruct 2512) supports multimodal inputs, whereas Phi 4 Reasoning does not.

Mistral Large 3 (675B Instruct 2512) can handle both text and other forms of data like images, making it suitable for multimodal applications.

Mistral Large 3 (675B Instruct 2512)

Text
Images
Audio
Video

Phi 4 Reasoning

Text
Images
Audio
Video

License

Usage and distribution terms

Mistral Large 3 (675B Instruct 2512) is licensed under Apache 2.0, while Phi 4 Reasoning uses MIT.

License differences may affect how you can use these models in commercial or open-source projects.

Mistral Large 3 (675B Instruct 2512)

Apache 2.0

Open weights

Phi 4 Reasoning

MIT

Open weights

Release Timeline

When each model was launched

Mistral Large 3 (675B Instruct 2512) was released on 2025-12-04, while Phi 4 Reasoning was released on 2025-04-30.

Mistral Large 3 (675B Instruct 2512) is 7 months newer than Phi 4 Reasoning.

Mistral Large 3 (675B Instruct 2512)

Dec 4, 2025

5 months ago

7mo newer
Phi 4 Reasoning

Apr 30, 2025

1.0 years ago

Knowledge Cutoff

When training data ends

Phi 4 Reasoning has a documented knowledge cutoff of 2025-03-01, while Mistral Large 3 (675B Instruct 2512)'s cutoff date is not specified.

We can confirm Phi 4 Reasoning's training data extends to 2025-03-01, but cannot make a direct comparison without Mistral Large 3 (675B Instruct 2512)'s cutoff date.

Mistral Large 3 (675B Instruct 2512)

Phi 4 Reasoning

Mar 2025

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion

Key Takeaways

Larger context window (262,100 tokens)
Supports multimodal inputs
Higher GPQA score (65.8% vs 43.9%)
Higher LiveCodeBench score (53.8% vs 34.4%)

Detailed Comparison

FAQ

Common questions about Mistral Large 3 (675B Instruct 2512) vs Phi 4 Reasoning.

Which is better, Mistral Large 3 (675B Instruct 2512) or Phi 4 Reasoning?

Phi 4 Reasoning significantly outperforms across most benchmarks. Mistral Large 3 (675B Instruct 2512) is made by Mistral AI and Phi 4 Reasoning is made by Microsoft. The best choice depends on your use case — compare their benchmark scores, pricing, and capabilities above.

How does Mistral Large 3 (675B Instruct 2512) compare to Phi 4 Reasoning in benchmarks?

Mistral Large 3 (675B Instruct 2512) scores MMMLU: 85.5%, AMC_2022_23: 52.0%, GPQA: 43.9%, LiveCodeBench: 34.4%, SimpleQA: 23.8%. Phi 4 Reasoning scores FlenQA: 97.7%, HumanEval+: 92.9%, IFEval: 83.4%, OmniMath: 76.6%, AIME 2024: 75.3%.

What are the context window sizes for Mistral Large 3 (675B Instruct 2512) and Phi 4 Reasoning?

Mistral Large 3 (675B Instruct 2512) supports 262K tokens and Phi 4 Reasoning supports an unknown number of tokens. A larger context window lets you process longer documents, conversations, or codebases in a single request.

What are the main differences between Mistral Large 3 (675B Instruct 2512) and Phi 4 Reasoning?

Key differences include multimodal support (yes vs no), licensing (Apache 2.0 vs MIT). See the full comparison above for benchmark-by-benchmark results.

Who makes Mistral Large 3 (675B Instruct 2512) and Phi 4 Reasoning?

Mistral Large 3 (675B Instruct 2512) is developed by Mistral AI and Phi 4 Reasoning is developed by Microsoft.