MME-RealWorld Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on MME-RealWorld

State-of-the-art frontier

Open

Proprietary

MME-RealWorld Leaderboard

1 models

				Context	Cost	License
1	Qwen2.5-Omni-7B Alibaba Cloud / Qwen Team		7B	—	—

FAQ

Common questions about MME-RealWorld

A comprehensive evaluation benchmark for Multimodal Large Language Models featuring over 13,366 high-resolution images and 29,429 question-answer pairs across 43 subtasks and 5 real-world scenarios. The largest manually annotated multimodal benchmark to date, designed to test MLLMs on challenging high-resolution real-world scenarios.

The MME-RealWorld paper is available at https://arxiv.org/abs/2408.13257. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The MME-RealWorld leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.616. The average score across all models is 0.616.

The highest MME-RealWorld score is 0.616, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.

1 models have been evaluated on the MME-RealWorld benchmark, with 0 verified results and 1 self-reported results.

MME-RealWorld is categorized under general, multimodal, and vision. The benchmark evaluates multimodal models.

MME-RealWorld

Progress Over Time

MME-RealWorld Leaderboard

FAQ

What is the MME-RealWorld benchmark?

Where can I find the MME-RealWorld paper?

What is the MME-RealWorld leaderboard?

What is the highest MME-RealWorld score?

How many models are evaluated on MME-RealWorld?

What categories does MME-RealWorld cover?