OSWorld-G
OSWorld-G (Grounding) evaluates screenshot grounding accuracy for OS automation tasks.
Progress Over Time
Interactive timeline showing model performance evolution on OSWorld-G
State-of-the-art frontier
Open
Proprietary
OSWorld-G Leaderboard
1 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Alibaba Cloud / Qwen Team | 0.683 | 236B | 262K | $0.45 $3.49 |
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about OSWorld-G
OSWorld-G (Grounding) evaluates screenshot grounding accuracy for OS automation tasks.
The OSWorld-G leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Thinking by Alibaba Cloud / Qwen Team leads with a score of 0.683. The average score across all models is 0.683.
The highest OSWorld-G score is 0.683, achieved by Qwen3 VL 235B A22B Thinking from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the OSWorld-G benchmark, with 0 verified results and 1 self-reported results.
OSWorld-G is categorized under agents, grounding, multimodal, and vision. The benchmark evaluates image models.