Benchmarks/multimodal/Android Control High_EM

Android Control High_EM

Android device control benchmark using high exact match evaluation metric for assessing agent performance on mobile interface tasks

Progress Over Time

Interactive timeline showing model performance evolution on Android Control High_EM

State-of-the-art frontier
Open
Proprietary

Android Control High_EM Leaderboard

3 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
Notice missing or incorrect data?

FAQ

Common questions about Android Control High_EM

Android device control benchmark using high exact match evaluation metric for assessing agent performance on mobile interface tasks
The Android Control High_EM leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 32B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.696. The average score across all models is 0.657.
The highest Android Control High_EM score is 0.696, achieved by Qwen2.5 VL 32B Instruct from Alibaba Cloud / Qwen Team.
3 models have been evaluated on the Android Control High_EM benchmark, with 0 verified results and 3 self-reported results.
Android Control High_EM is categorized under multimodal and reasoning. The benchmark evaluates multimodal models.