MMMU
MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
Progress Over Time
Interactive timeline showing model performance evolution on MMMU
State-of-the-art frontier
Open
Proprietary
MMMU Leaderboard
59 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | OpenAI | 0.854 | — | 400K | $1.25 $10.00 | |
1 | OpenAI | 0.854 | — | 400K | $1.25 $10.00 | |
1 | OpenAI | 0.854 | — | 400K | $1.25 $10.00 | |
4 | OpenAI | 0.842 | — | 400K | $1.25 $10.00 | |
5 | Alibaba Cloud / Qwen Team | 0.839 | 122B | 262K | $0.40 $3.20 | |
6 | OpenAI | 0.829 | — | 200K | $2.00 $8.00 | |
7 | Alibaba Cloud / Qwen Team | 0.823 | 27B | — | — | |
8 | 0.820 | — | 1.0M | $1.25 $10.00 | ||
9 | OpenAI | 0.816 | — | 200K | $1.10 $4.40 | |
10 | Alibaba Cloud / Qwen Team | 0.814 | 35B | 262K | $0.25 $2.00 | |
11 | Google | 0.797 | — | 1.0M | $0.30 $2.50 | |
12 | Google | 0.796 | — | 1.0M | $1.25 $10.00 | |
13 | StepFun | 0.781 | 10B | — | — | |
14 | xAI | 0.780 | — | 128K | $3.00 $15.00 | |
15 | OpenAI | 0.776 | — | 200K | $15.00 $60.00 | |
16 | 0.754 | — | — | — | ||
17 | OpenAI | 0.752 | — | 128K | $75.00 $150.00 | |
18 | Anthropic | 0.750 | — | 200K | $3.00 $15.00 | |
19 | OpenAI | 0.748 | — | 1.0M | $2.00 $8.00 | |
20 | Anthropic | 0.744 | — | 200K | $3.00 $15.00 | |
21 | Meta | 0.734 | 400B | 1.0M | $0.17 $0.85 | |
22 | Google | 0.729 | — | 1.0M | $0.10 $0.40 | |
23 | OpenAI | 0.727 | — | 1.0M | $0.40 $1.60 | |
24 | OpenAI | 0.722 | — | 128K | $2.50 $10.00 | |
25 | Google | 0.707 | — | 1.0M | $0.10 $0.40 | |
26 | Alibaba Cloud / Qwen Team | 0.703 | 73B | — | — | |
27 | Alibaba Cloud / Qwen Team | 0.702 | 72B | — | — | |
28 | Moonshot AI | 0.700 | — | — | — | |
28 | Alibaba Cloud / Qwen Team | 0.700 | 34B | — | — | |
30 | Meta | 0.694 | 109B | 10.0M | $0.08 $0.30 | |
31 | Anthropic | 0.683 | — | 200K | $3.00 $15.00 | |
32 | Google | 0.680 | — | 1.0M | $0.07 $0.30 | |
33 | xAI | 0.661 | — | 128K | $2.00 $10.00 | |
34 | Google | 0.659 | — | 2.1M | $2.50 $10.00 | |
35 | Mistral AI | 0.640 | 124B | 128K | $2.00 $6.00 | |
36 | xAI | 0.632 | — | — | — | |
37 | Mistral AI | 0.625 | 24B | — | — | |
38 | Google | 0.623 | — | 1.0M | $0.15 $0.60 | |
39 | Amazon | 0.617 | — | 300K | $0.80 $3.20 | |
40 | 0.603 | 90B | 128K | $0.35 $0.40 | ||
41 | OpenAI | 0.594 | — | 128K | $0.15 $0.60 | |
42 | Mistral AI | 0.593 | 24B | — | — | |
42 | Mistral AI | 0.593 | 24B | 128K | $0.10 $0.30 | |
44 | Alibaba Cloud / Qwen Team | 0.592 | 7B | — | — | |
45 | Alibaba Cloud / Qwen Team | 0.586 | 8B | — | — | |
46 | Amazon | 0.562 | — | 300K | $0.06 $0.24 | |
47 | OpenAI | 0.554 | — | 1.0M | $0.10 $0.40 | |
48 | Microsoft | 0.551 | 6B | 128K | $0.05 $0.10 | |
49 | Google | 0.537 | 8B | 1.0M | $0.07 $0.30 | |
50 | xAI | 0.536 | — | — | — |
Showing 1-50 of 59
1 / 2
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about MMMU
MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
The MMMU paper is available at https://arxiv.org/abs/2311.16502. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MMMU leaderboard ranks 59 AI models based on their performance on this benchmark. Currently, GPT-5.1 Instant by OpenAI leads with a score of 0.854. The average score across all models is 0.663.
The highest MMMU score is 0.854, achieved by GPT-5.1 Instant from OpenAI.
59 models have been evaluated on the MMMU benchmark, with 0 verified results and 57 self-reported results.
MMMU is categorized under general, healthcare, multimodal, and reasoning. The benchmark evaluates multimodal models.