Graphwalks BFS <128k
A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.
Progress Over Time
Interactive timeline showing model performance evolution on Graphwalks BFS <128k
State-of-the-art frontier
Open
Proprietary
Graphwalks BFS <128k Leaderboard
11 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenAI | — | 400K | $1.75 / $14.00 | ||
| 2 | OpenAI | — | 1.0M | $2.50 / $15.00 | ||
| 3 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 4 | OpenAI | — | 400K | $0.75 / $4.50 | ||
| 5 | OpenAI | — | 400K | $0.20 / $1.25 | ||
| 6 | OpenAI | — | 128K | $75.00 / $150.00 | ||
| 7 | OpenAI | — | 1.0M | $2.00 / $8.00 | ||
| 7 | OpenAI | — | 1.0M | $0.40 / $1.60 | ||
| 9 | OpenAI | — | 200K | $1.10 / $4.40 | ||
| 10 | OpenAI | — | 128K | $2.50 / $10.00 | ||
| 11 | OpenAI | — | 1.0M | $0.10 / $0.40 |
Notice missing or incorrect data?
FAQ
Common questions about Graphwalks BFS <128k
A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.
The Graphwalks BFS <128k leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, GPT-5.2 by OpenAI leads with a score of 0.940. The average score across all models is 0.662.
The highest Graphwalks BFS <128k score is 0.940, achieved by GPT-5.2 from OpenAI.
11 models have been evaluated on the Graphwalks BFS <128k benchmark, with 0 verified results and 11 self-reported results.
Graphwalks BFS <128k is categorized under reasoning and spatial reasoning. The benchmark evaluates text models.