Global-MMLU
A comprehensive multilingual benchmark covering 42 languages that addresses cultural and linguistic biases in evaluation, with improved translation quality and culturally sensitive question subsets.
Progress Over Time
Interactive timeline showing model performance evolution on Global-MMLU
State-of-the-art frontier
Open
Proprietary
Global-MMLU Leaderboard
4 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Google | 0.603 | 8B | 32K | $20.00 $40.00 | |
1 | 0.603 | 2B | — | — | ||
3 | Google | 0.551 | 8B | — | — | |
3 | 0.551 | 2B | — | — |
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about Global-MMLU
A comprehensive multilingual benchmark covering 42 languages that addresses cultural and linguistic biases in evaluation, with improved translation quality and culturally sensitive question subsets.
The Global-MMLU paper is available at https://arxiv.org/abs/2412.03304. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Global-MMLU leaderboard ranks 4 AI models based on their performance on this benchmark. Currently, Gemma 3n E4B Instructed by Google leads with a score of 0.603. The average score across all models is 0.577.
The highest Global-MMLU score is 0.603, achieved by Gemma 3n E4B Instructed from Google.
4 models have been evaluated on the Global-MMLU benchmark, with 0 verified results and 4 self-reported results.
Global-MMLU is categorized under general, language, and reasoning. The benchmark evaluates text models with multilingual support.