Benchmarks/agents/MM-Mind2Web

MM-Mind2Web

A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MM-Mind2Web

State-of-the-art frontier
Open
Proprietary

MM-Mind2Web Leaderboard

3 models
ContextCostLicense
1
Amazon
Amazon
300K$0.80 / $3.20
2
Amazon
Amazon
300K$0.06 / $0.24
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
480B
Notice missing or incorrect data?

FAQ

Common questions about MM-Mind2Web

A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.
The MM-Mind2Web paper is available at https://arxiv.org/abs/2306.06070. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MM-Mind2Web leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova Pro by Amazon leads with a score of 0.637. The average score across all models is 0.601.
The highest MM-Mind2Web score is 0.637, achieved by Nova Pro from Amazon.
3 models have been evaluated on the MM-Mind2Web benchmark, with 0 verified results and 3 self-reported results.
MM-Mind2Web is categorized under agents, frontend development, multimodal, and reasoning. The benchmark evaluates multimodal models.