MT-AIME 2025

Progress Over Time

Interactive timeline showing model performance evolution on MT-AIME 2025

State-of-the-art frontier
Open
Proprietary

MT-AIME 2025 Leaderboard

1 models
ContextCostLicense
1218B
Notice missing or incorrect data?
About this benchmark

What is MT-AIME 2025?

MT-AIME 2025 is Cohere's internal multilingual translation of AIME 2025, evaluated for Arabic, Japanese, and Korean in the Command A+ release.

MT-AIME 2025 is a text benchmark evaluating models on language, math, and reasoning tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 0.9.

Compare leaders on the best AI for language, best AI for math and best AI for reasoning leaderboards.

Current leaders

Command A+ from Cohere currently leads the MT-AIME 2025 leaderboard with a score of 0.860 across 1 evaluated AI models.

1Command A+Cohere86.0%

FAQ

Common questions about the MT-AIME 2025 benchmark and leaderboard.

What is the MT-AIME 2025 benchmark?

MT-AIME 2025 is Cohere's internal multilingual translation of AIME 2025, evaluated for Arabic, Japanese, and Korean in the Command A+ release.

What is the MT-AIME 2025 leaderboard?

The MT-AIME 2025 leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Command A+ by Cohere leads with a score of 0.860. The average score across all models is 0.860.

What is the highest MT-AIME 2025 score?

The highest MT-AIME 2025 score is 0.860, achieved by Command A+ from Cohere.

How many models are evaluated on MT-AIME 2025?

1 models have been evaluated on the MT-AIME 2025 benchmark, with 0 verified results and 1 self-reported results.

What categories does MT-AIME 2025 cover?

MT-AIME 2025 is categorized under language, math, and reasoning. The benchmark evaluates text models with multilingual support.

What's the difference between MT-AIME 2025 and AIME 2025?

MT-AIME 2025 is a variant of AIME 2025. See the AIME 2025 leaderboard for the broader benchmark and per-model comparison.

What is the best open-source model on MT-AIME 2025?

Command A+ by Cohere is the top-ranked open-source model on MT-AIME 2025, with a score of 0.860 (rank #1).

How recent are the MT-AIME 2025 leaderboard results?

The MT-AIME 2025 leaderboard was last updated in June 2026 and currently includes 1 evaluated models.