MM-Mind2Web
A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.
Progress Over Time
Interactive timeline showing model performance evolution on MM-Mind2Web
State-of-the-art frontier
Open
Proprietary
MM-Mind2Web Leaderboard
3 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Amazon | — | 300K | $0.80 / $3.20 | ||
| 2 | Amazon | — | 300K | $0.06 / $0.24 | ||
| 3 | Alibaba Cloud / Qwen Team | 480B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MM-Mind2Web
A multimodal web navigation benchmark comprising 2,000 open-ended tasks spanning 137 websites across 31 domains. Each task includes HTML documents paired with webpage screenshots, action sequences, and complex web interactions.
The MM-Mind2Web paper is available at https://arxiv.org/abs/2306.06070. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MM-Mind2Web leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova Pro by Amazon leads with a score of 0.637. The average score across all models is 0.601.
The highest MM-Mind2Web score is 0.637, achieved by Nova Pro from Amazon.
3 models have been evaluated on the MM-Mind2Web benchmark, with 0 verified results and 3 self-reported results.
MM-Mind2Web is categorized under agents, frontend development, multimodal, and reasoning. The benchmark evaluates multimodal models.