RealWorldQA

Progress Over Time

Interactive timeline showing model performance evolution on RealWorldQA

State-of-the-art frontier
Open
Proprietary

RealWorldQA Leaderboard

25 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
2
ByteDance
ByteDance
3
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
28B262K$0.60 / $3.60
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
19
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
21
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
22
23
DeepSeek
DeepSeek
27B
2416B
253B
Notice missing or incorrect data?
About this benchmark

What is RealWorldQA?

RealWorldQA is a benchmark designed to evaluate basic real-world spatial understanding capabilities of multimodal models. The initial release consists of over 700 anonymized images taken from vehicles and other real-world scenarios, each accompanied by a question and easily verifiable answer. Released by xAI as part of their Grok-1.5 Vision preview to test models' ability to understand natural scenes and spatial relationships in everyday visual contexts.

RealWorldQA is a multimodal benchmark evaluating models on spatial reasoning and vision tasks. LLM Stats tracks 25 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.9.

Compare leaders on the best AI for spatial reasoning and best AI for vision leaderboards.

Current leaders

Qwen3.7-Plus from Alibaba Cloud / Qwen Team currently leads the RealWorldQA leaderboard with a score of 0.869 across 25 evaluated AI models.

1Qwen3.7-PlusAlibaba Cloud / Qwen Team86.9%
2Seed 2.1 ProByteDance86.7%
3Seed 2.1 TurboByteDance86.3%
OSSQwen3.6-35B-A3B#5 open-weight85.3%

FAQ

Common questions about the RealWorldQA benchmark and leaderboard.

What is the RealWorldQA benchmark?

RealWorldQA is a benchmark designed to evaluate basic real-world spatial understanding capabilities of multimodal models. The initial release consists of over 700 anonymized images taken from vehicles and other real-world scenarios, each accompanied by a question and easily verifiable answer. Released by xAI as part of their Grok-1.5 Vision preview to test models' ability to understand natural scenes and spatial relationships in everyday visual contexts.

What is the RealWorldQA leaderboard?

The RealWorldQA leaderboard ranks 25 AI models based on their performance on this benchmark. Currently, Qwen3.7-Plus by Alibaba Cloud / Qwen Team leads with a score of 0.869. The average score across all models is 0.776.

What is the highest RealWorldQA score?

The highest RealWorldQA score is 0.869, achieved by Qwen3.7-Plus from Alibaba Cloud / Qwen Team.

How many models are evaluated on RealWorldQA?

25 models have been evaluated on the RealWorldQA benchmark, with 0 verified results and 25 self-reported results.

What categories does RealWorldQA cover?

RealWorldQA is categorized under spatial reasoning and vision. The benchmark evaluates multimodal models.

What is the best open-source model on RealWorldQA?

Qwen3.6-35B-A3B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on RealWorldQA, with a score of 0.853 (rank #5).

Which model offers the best value on RealWorldQA?

Among models scoring within 10% of the leader, Qwen3.5-27B from Alibaba Cloud / Qwen Team is the cheapest, at $0.30 per million input tokens with a score of 0.837.

How recent are the RealWorldQA leaderboard results?

The RealWorldQA leaderboard was last updated in July 2026 and currently includes 25 evaluated models.