Multilingual MMLU
MMLU-ProX is a comprehensive multilingual benchmark covering 29 typologically diverse languages, building upon MMLU-Pro. Each language version consists of 11,829 identical questions enabling direct cross-linguistic comparisons. The benchmark evaluates large language models' reasoning capabilities across linguistic and cultural boundaries through challenging, reasoning-focused questions with 10 answer choices.
Progress Over Time
Interactive timeline showing model performance evolution on Multilingual MMLU
State-of-the-art frontier
Open
Proprietary
Multilingual MMLU Leaderboard
5 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenAI | — | 200K | $1.10 / $4.40 | ||
| 2 | Mistral AI | 14B | — | — | ||
| 3 | Mistral AI | 8B | — | — | ||
| 4 | Mistral AI | 3B | — | — | ||
| 5 | Microsoft | 4B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about Multilingual MMLU
MMLU-ProX is a comprehensive multilingual benchmark covering 29 typologically diverse languages, building upon MMLU-Pro. Each language version consists of 11,829 identical questions enabling direct cross-linguistic comparisons. The benchmark evaluates large language models' reasoning capabilities across linguistic and cultural boundaries through challenging, reasoning-focused questions with 10 answer choices.
The Multilingual MMLU paper is available at https://arxiv.org/abs/2503.10497. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Multilingual MMLU leaderboard ranks 5 AI models based on their performance on this benchmark. Currently, o3-mini by OpenAI leads with a score of 0.807. The average score across all models is 0.680.
The highest Multilingual MMLU score is 0.807, achieved by o3-mini from OpenAI.
5 models have been evaluated on the Multilingual MMLU benchmark, with 0 verified results and 5 self-reported results.
Multilingual MMLU is categorized under general, language, and reasoning. The benchmark evaluates text models with multilingual support.