Graphwalks BFS <128k

Name: Graphwalks BFS <128k Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on Graphwalks BFS <128k

State-of-the-art frontier

Open

Proprietary

Graphwalks BFS <128k Leaderboard

11 models

			Context	Cost
1	GPT-5.2 OpenAI	—	400K	$1.75 / $14.00
2	GPT-5.4 OpenAI	—	1.0M	$2.50 / $15.00
3	GPT-5 OpenAI	—	—	—
4	GPT-5.4 mini OpenAI	—	400K	$0.75 / $4.50
5	GPT-5.4 nano OpenAI	—	400K	$0.20 / $1.25
6	GPT-4.5 OpenAI	—	—	—
7	GPT-4.1 OpenAI	—	1.0M	$2.00 / $8.00
7	GPT-4.1 mini OpenAI	—	1.0M	$0.40 / $1.60
9	o3-mini OpenAI	—	—	—
10	GPT-4o OpenAI	—	128K	$2.50 / $10.00
11	GPT-4.1 nano OpenAI	—	1.0M	$0.10 / $0.40

Notice missing or incorrect data?

About this benchmark

What is Graphwalks BFS <128k?

A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.

Graphwalks BFS <128k is a text benchmark evaluating models on reasoning and spatial reasoning tasks. LLM Stats tracks 11 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.9.

Compare leaders on the best AI for reasoning and best AI for spatial reasoning leaderboards.

Current leaders

GPT-5.2 from OpenAI currently leads the Graphwalks BFS <128k leaderboard with a score of 0.940 across 11 evaluated AI models.

GPT-5.2OpenAI94.0%

GPT-5.4OpenAI93.0%

GPT-5OpenAI78.3%

FAQ

Common questions about the Graphwalks BFS <128k benchmark and leaderboard.

What is the Graphwalks BFS <128k benchmark?

What is the Graphwalks BFS <128k leaderboard?

The Graphwalks BFS <128k leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, GPT-5.2 by OpenAI leads with a score of 0.940. The average score across all models is 0.662.

What is the highest Graphwalks BFS <128k score?

The highest Graphwalks BFS <128k score is 0.940, achieved by GPT-5.2 from OpenAI.

How many models are evaluated on Graphwalks BFS <128k?

11 models have been evaluated on the Graphwalks BFS <128k benchmark, with 0 verified results and 11 self-reported results.

What categories does Graphwalks BFS <128k cover?

Graphwalks BFS <128k is categorized under reasoning and spatial reasoning. The benchmark evaluates text models.

Which model offers the best value on Graphwalks BFS <128k?

Among models scoring within 10% of the leader, GPT-5.2 from OpenAI is the cheapest, at $1.75 per million input tokens with a score of 0.940.

How recent are the Graphwalks BFS <128k leaderboard results?

The Graphwalks BFS <128k leaderboard was last updated in July 2026 and currently includes 11 evaluated models.