MMMU
MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
Progress Over Time
Interactive timeline showing model performance evolution on MMMU
State-of-the-art frontier
Open
Proprietary
MMMU Leaderboard
60 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | — | — | — | ||
| 2 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 2 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 2 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 5 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 6 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 7 | OpenAI | — | 200K | $2.00 / $8.00 | ||
| 8 | Alibaba Cloud / Qwen Team | 27B | 262K | $0.30 / $2.40 | ||
| 9 | — | 1.0M | $1.25 / $10.00 | |||
| 10 | OpenAI | — | 200K | $1.10 / $4.40 | ||
| 11 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 | ||
| 12 | Google | — | 1.0M | $0.30 / $2.50 | ||
| 13 | Google | — | 1.0M | $1.25 / $10.00 | ||
| 14 | StepFun | 10B | — | — | ||
| 15 | xAI | — | 128K | $3.00 / $15.00 | ||
| 16 | OpenAI | — | 200K | $15.00 / $60.00 | ||
| 17 | — | — | — | |||
| 18 | OpenAI | — | 128K | $75.00 / $150.00 | ||
| 19 | Anthropic | — | 200K | $3.00 / $15.00 | ||
| 20 | OpenAI | — | 1.0M | $2.00 / $8.00 | ||
| 21 | Anthropic | — | — | — | ||
| 22 | Meta | 400B | 1.0M | $0.17 / $0.60 | ||
| 23 | Google | — | 1.0M | $0.10 / $0.40 | ||
| 24 | OpenAI | — | 1.0M | $0.40 / $1.60 | ||
| 25 | OpenAI | — | 128K | $2.50 / $10.00 | ||
| 26 | Google | — | 1.0M | $0.10 / $0.40 | ||
| 27 | Alibaba Cloud / Qwen Team | 73B | — | — | ||
| 28 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 29 | Moonshot AI | — | — | — | ||
| 29 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 31 | Meta | 109B | 10.0M | $0.08 / $0.30 | ||
| 32 | Anthropic | — | 200K | $3.00 / $15.00 | ||
| 33 | Google | — | 1.0M | $0.07 / $0.30 | ||
| 34 | xAI | — | 128K | $2.00 / $10.00 | ||
| 35 | Google | — | 2.1M | $2.50 / $10.00 | ||
| 36 | Mistral AI | 124B | 128K | $2.00 / $6.00 | ||
| 37 | xAI | — | — | — | ||
| 38 | Mistral AI | 24B | — | — | ||
| 39 | Google | — | 1.0M | $0.15 / $0.60 | ||
| 40 | Amazon | — | 300K | $0.80 / $3.20 | ||
| 41 | 90B | 128K | $0.35 / $0.40 | |||
| 42 | OpenAI | — | 128K | $0.15 / $0.60 | ||
| 43 | Mistral AI | 24B | — | — | ||
| 43 | Mistral AI | 24B | 128K | $0.10 / $0.30 | ||
| 45 | Alibaba Cloud / Qwen Team | 7B | — | — | ||
| 46 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 47 | Amazon | — | 300K | $0.06 / $0.24 | ||
| 48 | OpenAI | — | 1.0M | $0.10 / $0.40 | ||
| 49 | Microsoft | 6B | 128K | $0.05 / $0.10 | ||
| 50 | Google | 8B | 1.0M | $0.07 / $0.30 |
1–50 of 60
1/2
Notice missing or incorrect data?
FAQ
Common questions about MMMU
MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
The MMMU paper is available at https://arxiv.org/abs/2311.16502. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MMMU leaderboard ranks 60 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.860. The average score across all models is 0.667.
The highest MMMU score is 0.860, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
60 models have been evaluated on the MMMU benchmark, with 0 verified results and 58 self-reported results.
MMMU is categorized under general, healthcare, multimodal, reasoning, and vision. The benchmark evaluates multimodal models.