GameWorld

Progress Over Time

Interactive timeline showing model performance evolution on GameWorld

State-of-the-art frontier
Open
Proprietary

GameWorld Leaderboard

2 models
ContextCostLicense
1
ByteDance
ByteDance
2
ByteDance
ByteDance
Notice missing or incorrect data?
About this benchmark

What is GameWorld?

GameWorld evaluates agents on interactive game environments, testing perception, planning, and sequential decision-making to accomplish in-game objectives.

GameWorld is a multimodal benchmark evaluating models on multimodal, reasoning, and agents tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.3, with the leader at 0.3.

Compare leaders on the best AI for multimodal, best AI for reasoning and best AI for agents leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the GameWorld leaderboard with a score of 0.312 across 2 evaluated AI models.

1Seed 2.1 ProByteDance31.2%
2Seed 2.1 TurboByteDance25.9%

FAQ

Common questions about the GameWorld benchmark and leaderboard.

What is the GameWorld benchmark?

GameWorld evaluates agents on interactive game environments, testing perception, planning, and sequential decision-making to accomplish in-game objectives.

What is the GameWorld leaderboard?

The GameWorld leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.312. The average score across all models is 0.285.

What is the highest GameWorld score?

The highest GameWorld score is 0.312, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on GameWorld?

2 models have been evaluated on the GameWorld benchmark, with 0 verified results and 2 self-reported results.

What categories does GameWorld cover?

GameWorld is categorized under multimodal, reasoning, and agents. The benchmark evaluates multimodal models.

How recent are the GameWorld leaderboard results?

The GameWorld leaderboard was last updated in June 2026 and currently includes 2 evaluated models.
GameWorld Leaderboard