MobileMiniWob++_SR
MobileMiniWob++ SR (Success Rate) is an adaptation of the MiniWob++ web interaction benchmark for mobile Android environments within AndroidWorld. It comprises 92 web interaction tasks adapted for touch-based mobile interfaces, evaluating agents' ability to navigate and interact with web applications on mobile devices.
Progress Over Time
Interactive timeline showing model performance evolution on MobileMiniWob++_SR
State-of-the-art frontier
Open
Proprietary
MobileMiniWob++_SR Leaderboard
2 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
2 | Alibaba Cloud / Qwen Team | 72B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MobileMiniWob++_SR
MobileMiniWob++ SR (Success Rate) is an adaptation of the MiniWob++ web interaction benchmark for mobile Android environments within AndroidWorld. It comprises 92 web interaction tasks adapted for touch-based mobile interfaces, evaluating agents' ability to navigate and interact with web applications on mobile devices.
The MobileMiniWob++_SR paper is available at https://arxiv.org/abs/2405.14573. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MobileMiniWob++_SR leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 7B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.914. The average score across all models is 0.797.
The highest MobileMiniWob++_SR score is 0.914, achieved by Qwen2.5 VL 7B Instruct from Alibaba Cloud / Qwen Team.
2 models have been evaluated on the MobileMiniWob++_SR benchmark, with 0 verified results and 2 self-reported results.
MobileMiniWob++_SR is categorized under agents, frontend development, and multimodal. The benchmark evaluates multimodal models.