RealWorldQA

RealWorldQA is a benchmark designed to evaluate basic real-world spatial understanding capabilities of multimodal models. The initial release consists of over 700 anonymized images taken from vehicles and other real-world scenarios, each accompanied by a question and easily verifiable answer. Released by xAI as part of their Grok-1.5 Vision preview to test models' ability to understand natural scenes and spatial relationships in everyday visual contexts.

Qwen3.6 Plus from Alibaba Cloud / Qwen Team currently leads the RealWorldQA leaderboard with a score of 0.854 across 22 evaluated AI models.