OlympiadBench
A challenging benchmark for promoting AGI with Olympiad-level bilingual multimodal scientific problems. Comprises 8,476 math and physics problems from international and Chinese Olympiads and the Chinese college entrance exam, featuring expert-level annotations for step-by-step reasoning. Includes both text-only and multimodal problems in English and Chinese.
Progress Over Time
Interactive timeline showing model performance evolution on OlympiadBench
State-of-the-art frontier
Open
Proprietary
OlympiadBench Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 73B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about OlympiadBench
A challenging benchmark for promoting AGI with Olympiad-level bilingual multimodal scientific problems. Comprises 8,476 math and physics problems from international and Chinese Olympiads and the Chinese college entrance exam, featuring expert-level annotations for step-by-step reasoning. Includes both text-only and multimodal problems in English and Chinese.
The OlympiadBench paper is available at https://arxiv.org/abs/2402.14008. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The OlympiadBench leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, QvQ-72B-Preview by Alibaba Cloud / Qwen Team leads with a score of 0.204. The average score across all models is 0.204.
The highest OlympiadBench score is 0.204, achieved by QvQ-72B-Preview from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the OlympiadBench benchmark, with 0 verified results and 1 self-reported results.
OlympiadBench is categorized under math, multimodal, physics, and reasoning. The benchmark evaluates multimodal models with multilingual support.