MT-AIME 2025
Progress Over Time
Interactive timeline showing model performance evolution on MT-AIME 2025
MT-AIME 2025 Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Cohere | 218B | — | — |
What is MT-AIME 2025?
MT-AIME 2025 is Cohere's internal multilingual translation of AIME 2025, evaluated for Arabic, Japanese, and Korean in the Command A+ release.
MT-AIME 2025 is a text benchmark evaluating models on language, math, and reasoning tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 0.9.
Compare leaders on the best AI for language, best AI for math and best AI for reasoning leaderboards.
Current leaders
Command A+ from Cohere currently leads the MT-AIME 2025 leaderboard with a score of 0.860 across 1 evaluated AI models.
FAQ
Common questions about the MT-AIME 2025 benchmark and leaderboard.