ARC-AGI
The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.
Progress Over Time
Interactive timeline showing model performance evolution on ARC-AGI
State-of-the-art frontier
Open
Proprietary
ARC-AGI Leaderboard
6 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | OpenAI | 0.937 | — | 1.0M | $2.50 $15.00 | |
2 | OpenAI | 0.905 | — | 400K | $21.00 $168.00 | |
3 | OpenAI | 0.880 | — | 200K | $2.00 $8.00 | |
4 | OpenAI | 0.862 | — | 400K | $1.75 $14.00 | |
5 | Meituan | 0.503 | 560B | 128K | $0.30 $1.20 | |
6 | Alibaba Cloud / Qwen Team | 0.418 | 235B | 262K | $0.15 $0.80 |
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about ARC-AGI
The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.
The ARC-AGI paper is available at https://arxiv.org/abs/1911.01547. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The ARC-AGI leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, GPT-5.4 by OpenAI leads with a score of 0.937. The average score across all models is 0.751.
The highest ARC-AGI score is 0.937, achieved by GPT-5.4 from OpenAI.
6 models have been evaluated on the ARC-AGI benchmark, with 0 verified results and 6 self-reported results.
ARC-AGI is categorized under reasoning, spatial reasoning, and vision. The benchmark evaluates image models.