Benchmarks/agents/MobileMiniWob++_SR

MobileMiniWob++_SR

MobileMiniWob++ SR (Success Rate) is an adaptation of the MiniWob++ web interaction benchmark for mobile Android environments within AndroidWorld. It comprises 92 web interaction tasks adapted for touch-based mobile interfaces, evaluating agents' ability to navigate and interact with web applications on mobile devices.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MobileMiniWob++_SR

State-of-the-art frontier
Open
Proprietary

MobileMiniWob++_SR Leaderboard

2 models • 0 verified
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
Notice missing or incorrect data?

FAQ

Common questions about MobileMiniWob++_SR

MobileMiniWob++ SR (Success Rate) is an adaptation of the MiniWob++ web interaction benchmark for mobile Android environments within AndroidWorld. It comprises 92 web interaction tasks adapted for touch-based mobile interfaces, evaluating agents' ability to navigate and interact with web applications on mobile devices.
The MobileMiniWob++_SR paper is available at https://arxiv.org/abs/2405.14573. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MobileMiniWob++_SR leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 7B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.914. The average score across all models is 0.797.
The highest MobileMiniWob++_SR score is 0.914, achieved by Qwen2.5 VL 7B Instruct from Alibaba Cloud / Qwen Team.
2 models have been evaluated on the MobileMiniWob++_SR benchmark, with 0 verified results and 2 self-reported results.
MobileMiniWob++_SR is categorized under agents, frontend development, and multimodal. The benchmark evaluates multimodal models.