Agent Startup Bench

Progress Over Time

Interactive timeline showing model performance evolution on Agent Startup Bench

State-of-the-art frontier
Open
Proprietary

Agent Startup Bench Leaderboard

2 models
ContextCostLicense
1
ByteDance
ByteDance
2
ByteDance
ByteDance
Notice missing or incorrect data?
About this benchmark

What is Agent Startup Bench?

Agent Startup Bench measures AI agents on high-economic-value, startup-style tasks that require autonomous planning and execution to deliver practical, verifiable results.

Agent Startup Bench is a text benchmark evaluating models on reasoning, general, and agents tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.7.

Compare leaders on the best AI for reasoning, best AI for general and best AI for agents leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the Agent Startup Bench leaderboard with a score of 0.688 across 2 evaluated AI models.

1Seed 2.1 ProByteDance68.8%
2Seed 2.1 TurboByteDance54.0%

FAQ

Common questions about the Agent Startup Bench benchmark and leaderboard.

What is the Agent Startup Bench benchmark?

Agent Startup Bench measures AI agents on high-economic-value, startup-style tasks that require autonomous planning and execution to deliver practical, verifiable results.

What is the Agent Startup Bench leaderboard?

The Agent Startup Bench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.688. The average score across all models is 0.614.

What is the highest Agent Startup Bench score?

The highest Agent Startup Bench score is 0.688, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on Agent Startup Bench?

2 models have been evaluated on the Agent Startup Bench benchmark, with 0 verified results and 2 self-reported results.

What categories does Agent Startup Bench cover?

Agent Startup Bench is categorized under reasoning, general, and agents. The benchmark evaluates text models.

How recent are the Agent Startup Bench leaderboard results?

The Agent Startup Bench leaderboard was last updated in June 2026 and currently includes 2 evaluated models.