ScreenSpot Pro
ScreenSpot-Pro is a novel GUI grounding benchmark designed to rigorously evaluate the grounding capabilities of multimodal large language models (MLLMs) in professional high-resolution computing environments. The benchmark comprises 1,581 instructions across 23 applications spanning 5 industries and 3 operating systems, featuring authentic high-resolution images from professional domains with expert annotations. Unlike previous benchmarks that focus on cropped screenshots in consumer applications, ScreenSpot-Pro addresses the complexity and diversity of real-world professional software scenarios, revealing significant performance gaps in current MLLM GUI perception capabilities.
Progress Over Time
Interactive timeline showing model performance evolution on ScreenSpot Pro
ScreenSpot Pro Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | OpenAI | — | — | — | ||
2 | Google | — | — | — | ||
3 | Alibaba Cloud / Qwen Team | 122B | — | — | ||
4 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
5 | Google | — | — | — | ||
6 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
7 | Qwen3.6 PlusNew Alibaba Cloud / Qwen Team | — | — | — | ||
8 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 $1.49 | ||
9 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 | ||
10 | Alibaba Cloud / Qwen Team | 31B | — | — | ||
11 | Alibaba Cloud / Qwen Team | 4B | — | — | ||
12 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
13 | Alibaba Cloud / Qwen Team | 31B | — | — | ||
14 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
15 | Alibaba Cloud / Qwen Team | 9B | — | — | ||
16 | Alibaba Cloud / Qwen Team | 4B | — | — | ||
17 | Alibaba Cloud / Qwen Team | 9B | — | — | ||
18 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
19 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
20 | Alibaba Cloud / Qwen Team | 8B | — | — |
FAQ
Common questions about ScreenSpot Pro