Benchmarks/general/MME-RealWorld

MME-RealWorld

A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MME-RealWorld

State-of-the-art frontier
Open
Proprietary

MME-RealWorld Leaderboard

1 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
Notice missing or incorrect data?

FAQ

Common questions about MME-RealWorld

A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.
The MME-RealWorld paper is available at https://arxiv.org/abs/2408.13257. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MME-RealWorld leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.616. The average score across all models is 0.616.
The highest MME-RealWorld score is 0.616, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the MME-RealWorld benchmark, with 0 verified results and 1 self-reported results.
MME-RealWorld is categorized under general, multimodal, and vision. The benchmark evaluates multimodal models.