MME-RealWorld
A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.
Progress Over Time
Interactive timeline showing model performance evolution on MME-RealWorld
State-of-the-art frontier
Open
Proprietary
MME-RealWorld Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 7B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MME-RealWorld
A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.
The MME-RealWorld paper is available at https://arxiv.org/abs/2408.13257. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MME-RealWorld leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.616. The average score across all models is 0.616.
The highest MME-RealWorld score is 0.616, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the MME-RealWorld benchmark, with 0 verified results and 1 self-reported results.
MME-RealWorld is categorized under general, multimodal, and vision. The benchmark evaluates multimodal models.