Graphwalks BFS <128k

Progress Over Time

Interactive timeline showing model performance evolution on Graphwalks BFS <128k

State-of-the-art frontier
Open
Proprietary

Graphwalks BFS <128k Leaderboard

11 models
ContextCostLicense
1
OpenAI
OpenAI
400K$1.75 / $14.00
2
OpenAI
OpenAI
1.0M$2.50 / $15.00
3
OpenAI
OpenAI
4400K$0.75 / $4.50
5400K$0.20 / $1.25
6
OpenAI
OpenAI
7
OpenAI
OpenAI
1.0M$2.00 / $8.00
71.0M$0.40 / $1.60
9
OpenAI
OpenAI
10
OpenAI
OpenAI
128K$2.50 / $10.00
111.0M$0.10 / $0.40
Notice missing or incorrect data?
About this benchmark

What is Graphwalks BFS <128k?

A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.

Graphwalks BFS <128k is a text benchmark evaluating models on reasoning and spatial reasoning tasks. LLM Stats tracks 11 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.9.

Compare leaders on the best AI for reasoning and best AI for spatial reasoning leaderboards.

Current leaders

GPT-5.2 from OpenAI currently leads the Graphwalks BFS <128k leaderboard with a score of 0.940 across 11 evaluated AI models.

1GPT-5.2OpenAI94.0%
2GPT-5.4OpenAI93.0%
3GPT-5OpenAI78.3%

FAQ

Common questions about the Graphwalks BFS <128k benchmark and leaderboard.

What is the Graphwalks BFS <128k benchmark?

A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.

What is the Graphwalks BFS <128k leaderboard?

The Graphwalks BFS <128k leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, GPT-5.2 by OpenAI leads with a score of 0.940. The average score across all models is 0.662.

What is the highest Graphwalks BFS <128k score?

The highest Graphwalks BFS <128k score is 0.940, achieved by GPT-5.2 from OpenAI.

How many models are evaluated on Graphwalks BFS <128k?

11 models have been evaluated on the Graphwalks BFS <128k benchmark, with 0 verified results and 11 self-reported results.

What categories does Graphwalks BFS <128k cover?

Graphwalks BFS <128k is categorized under reasoning and spatial reasoning. The benchmark evaluates text models.

Which model offers the best value on Graphwalks BFS <128k?

Among models scoring within 10% of the leader, GPT-5.2 from OpenAI is the cheapest, at $1.75 per million input tokens with a score of 0.940.

How recent are the Graphwalks BFS <128k leaderboard results?

The Graphwalks BFS <128k leaderboard was last updated in July 2026 and currently includes 11 evaluated models.