ARC-AGI v2 Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on ARC-AGI v2

State-of-the-art frontier

Open

Proprietary

ARC-AGI v2 Leaderboard

15 models

			Context	Cost
1	GPT-5.5New OpenAI	—	1.0M	$5.00 / $30.00
2	Gemini 3.1 Pro Google	—	1.0M	$2.50 / $15.00
3	GPT-5.4 OpenAI	—	1.0M	$2.50 / $15.00
4	Claude Opus 4.6 Anthropic	—	1.0M	$5.00 / $25.00
5	Claude Sonnet 4.6 Anthropic	—	200K	$3.00 / $15.00
6	GPT-5.2 Pro OpenAI	—	400K	$21.00 / $168.00
7	GPT-5.2 OpenAI	—	400K	$1.75 / $14.00
8	Muse Spark Meta	—	—	—
9	Claude Opus 4.5 Anthropic	—	200K	$5.00 / $25.00
10	Gemini 3 Flash Google	—	1.0M	$0.50 / $3.00
11	Gemini 3 Pro Google	—	—	—
12	Grok-4 xAI	—	—	—
13	Claude Opus 4 Anthropic	—	—	—
14	o3 OpenAI	—	200K	$2.00 / $8.00
15	Gemini 2.5 Pro Google	—	1.0M	$1.25 / $10.00

FAQ

Common questions about ARC-AGI v2

ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks. It evaluates fluid intelligence via input-output grid pairs (1x1 to 30x30) using colored cells (0-9), requiring models to identify underlying transformation rules from demonstration examples and apply them to test cases. Designed to be easy for humans but challenging for AI, focusing on core cognitive abilities like spatial reasoning, pattern recognition, and compositional generalization.

The ARC-AGI v2 paper is available at https://arxiv.org/abs/2505.11831. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The ARC-AGI v2 leaderboard ranks 15 AI models based on their performance on this benchmark. Currently, GPT-5.5 by OpenAI leads with a score of 0.850. The average score across all models is 0.434.

The highest ARC-AGI v2 score is 0.850, achieved by GPT-5.5 from OpenAI.

15 models have been evaluated on the ARC-AGI v2 benchmark, with 0 verified results and 12 self-reported results.

ARC-AGI v2 is categorized under reasoning, spatial reasoning, and vision. The benchmark evaluates multimodal models.

ARC-AGI v2

Progress Over Time

ARC-AGI v2 Leaderboard

FAQ

What is the ARC-AGI v2 benchmark?

Where can I find the ARC-AGI v2 paper?

What is the ARC-AGI v2 leaderboard?

What is the highest ARC-AGI v2 score?

How many models are evaluated on ARC-AGI v2?

What categories does ARC-AGI v2 cover?