Android Control High_EM
Android device control benchmark using high exact match evaluation metric for assessing agent performance on mobile interface tasks
Progress Over Time
Interactive timeline showing model performance evolution on Android Control High_EM
State-of-the-art frontier
Open
Proprietary
Android Control High_EM Leaderboard
3 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 2 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 3 | Alibaba Cloud / Qwen Team | 8B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about Android Control High_EM
Android device control benchmark using high exact match evaluation metric for assessing agent performance on mobile interface tasks
The Android Control High_EM leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 32B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.696. The average score across all models is 0.657.
The highest Android Control High_EM score is 0.696, achieved by Qwen2.5 VL 32B Instruct from Alibaba Cloud / Qwen Team.
3 models have been evaluated on the Android Control High_EM benchmark, with 0 verified results and 3 self-reported results.
Android Control High_EM is categorized under multimodal and reasoning. The benchmark evaluates multimodal models.