OSWorld-G

Name: OSWorld-G Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on OSWorld-G

State-of-the-art frontier

Open

Proprietary

OSWorld-G Leaderboard

1 models

				Context	Cost	License
1	Qwen3 VL 235B A22B Thinking Alibaba Cloud / Qwen Team		236B	—	—

Notice missing or incorrect data?

About this benchmark

What is OSWorld-G?

OSWorld-G (Grounding) evaluates screenshot grounding accuracy for OS automation tasks.

OSWorld-G is a image benchmark evaluating models on multimodal, grounding, agents, and vision tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–100 scale. The current average is 0.7, with the leader at 0.7.

Compare leaders on the best AI for multimodal, best AI for grounding, best AI for agents and best AI for vision leaderboards.

Current leaders

Qwen3 VL 235B A22B Thinking from Alibaba Cloud / Qwen Team currently leads the OSWorld-G leaderboard with a score of 0.683 across 1 evaluated AI models.

Qwen3 VL 235B A22B ThinkingAlibaba Cloud / Qwen Team0.7%

FAQ

Common questions about the OSWorld-G benchmark and leaderboard.

What is the OSWorld-G benchmark?

OSWorld-G (Grounding) evaluates screenshot grounding accuracy for OS automation tasks.

What is the OSWorld-G leaderboard?

The OSWorld-G leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Thinking by Alibaba Cloud / Qwen Team leads with a score of 0.683. The average score across all models is 0.683.

What is the highest OSWorld-G score?

The highest OSWorld-G score is 0.683, achieved by Qwen3 VL 235B A22B Thinking from Alibaba Cloud / Qwen Team.

How many models are evaluated on OSWorld-G?

1 models have been evaluated on the OSWorld-G benchmark, with 0 verified results and 1 self-reported results.

What categories does OSWorld-G cover?

OSWorld-G is categorized under multimodal, grounding, agents, and vision. The benchmark evaluates image models.

What's the difference between OSWorld-G and OSWorld?

OSWorld-G is a variant of OSWorld. See the OSWorld leaderboard for the broader benchmark and per-model comparison.

What is the best open-source model on OSWorld-G?

Qwen3 VL 235B A22B Thinking by Alibaba Cloud / Qwen Team is the top-ranked open-source model on OSWorld-G, with a score of 0.683 (rank #1).

How recent are the OSWorld-G leaderboard results?

The OSWorld-G leaderboard was last updated in July 2026 and currently includes 1 evaluated models.