VisualWebBench

A multimodal benchmark designed to assess the capabilities of multimodal large language models (MLLMs) across web page understanding and grounding tasks. Comprises 7 tasks (captioning, webpage QA, heading OCR, element OCR, element grounding, action prediction, and action grounding) with 1.5K human-curated instances from 139 real websites across 87 sub-domains.

Nova Pro from Amazon currently leads the VisualWebBench leaderboard with a score of 0.797 across 2 evaluated AI models.

Paper