GraphWalks

Name: GraphWalks Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on GraphWalks

State-of-the-art frontier

Open

Proprietary

GraphWalks Leaderboard

3 models

			Context	Cost
1	MAI-Thinking-1 Microsoft	1.0T	—	—
2	MiMo-V2.5 Xiaomi	311B	1.0M	$0.17 / $0.34
3	MiMo-V2.5-Pro Xiaomi	1.0T	1.0M	$0.43 / $0.87

Notice missing or incorrect data?

About this benchmark

What is GraphWalks?

GraphWalks is a synthetic multi-hop long-context reasoning benchmark in which a model is given an edge-list representation of a graph and must traverse it to find neighboring nodes (via breadth-first search) or parent nodes for a given start node. Performance is reported as F1 of the model-predicted answer set versus the ground truth.

GraphWalks is a text benchmark evaluating models on long context and reasoning tasks. LLM Stats tracks 3 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.9.

Compare leaders on the best AI for long context and best AI for reasoning leaderboards.

Current leaders

MAI-Thinking-1 from Microsoft currently leads the GraphWalks leaderboard with a score of 0.900 across 3 evaluated AI models.

MAI-Thinking-1Microsoft90.0%

MiMo-V2.5Xiaomi87.0%

MiMo-V2.5-ProXiaomi62.0%

FAQ

Common questions about the GraphWalks benchmark and leaderboard.

What is the GraphWalks benchmark?

What is the GraphWalks leaderboard?

The GraphWalks leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, MAI-Thinking-1 by Microsoft leads with a score of 0.900. The average score across all models is 0.797.

What is the highest GraphWalks score?

The highest GraphWalks score is 0.900, achieved by MAI-Thinking-1 from Microsoft.

How many models are evaluated on GraphWalks?

3 models have been evaluated on the GraphWalks benchmark, with 0 verified results and 3 self-reported results.

What categories does GraphWalks cover?

GraphWalks is categorized under long context and reasoning. The benchmark evaluates text models.

What is the best open-source model on GraphWalks?

MiMo-V2.5 by Xiaomi is the top-ranked open-source model on GraphWalks, with a score of 0.870 (rank #2).

Which model offers the best value on GraphWalks?

Among models scoring within 10% of the leader, MiMo-V2.5 from Xiaomi is the cheapest, at $0.17 per million input tokens with a score of 0.870.

How recent are the GraphWalks leaderboard results?

The GraphWalks leaderboard was last updated in July 2026 and currently includes 3 evaluated models.