MM-Mind2Web Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on MM-Mind2Web

State-of-the-art frontier

Open

Proprietary

MM-Mind2Web Leaderboard

3 models

			Context	Cost
1	Nova Pro Amazon	—	300K	$0.80 / $3.20
2	Nova Lite Amazon	—	300K	$0.06 / $0.24
3	Qwen3-Coder 480B A35B Instruct Alibaba Cloud / Qwen Team	480B	—	—

FAQ

Common questions about MM-Mind2Web

A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.

The MM-Mind2Web paper is available at https://arxiv.org/abs/2306.06070. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The MM-Mind2Web leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova Pro by Amazon leads with a score of 0.637. The average score across all models is 0.601.

The highest MM-Mind2Web score is 0.637, achieved by Nova Pro from Amazon.

3 models have been evaluated on the MM-Mind2Web benchmark, with 0 verified results and 3 self-reported results.

MM-Mind2Web is categorized under agents, frontend development, multimodal, and reasoning. The benchmark evaluates multimodal models.

MM-Mind2Web

Progress Over Time

MM-Mind2Web Leaderboard

FAQ

What is the MM-Mind2Web benchmark?

Where can I find the MM-Mind2Web paper?

What is the MM-Mind2Web leaderboard?

What is the highest MM-Mind2Web score?

How many models are evaluated on MM-Mind2Web?

What categories does MM-Mind2Web cover?