SeedClawBench

Progress Over Time

Interactive timeline showing model performance evolution on SeedClawBench

State-of-the-art frontier
Open
Proprietary

SeedClawBench Leaderboard

2 models
ContextCostLicense
1
ByteDance
ByteDance
2
ByteDance
ByteDance
Notice missing or incorrect data?
About this benchmark

What is SeedClawBench?

SeedClawBench is an agentic coding benchmark measuring overall model performance on real-world, tool-using software development tasks.

SeedClawBench is a text benchmark evaluating models on agents, coding, and tool calling tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.7.

Compare leaders on the best AI for agents, best AI for coding and best AI for tool calling leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the SeedClawBench leaderboard with a score of 0.666 across 2 evaluated AI models.

1Seed 2.1 ProByteDance66.6%
2Seed 2.1 TurboByteDance63.8%

FAQ

Common questions about the SeedClawBench benchmark and leaderboard.

What is the SeedClawBench benchmark?

SeedClawBench is an agentic coding benchmark measuring overall model performance on real-world, tool-using software development tasks.

What is the SeedClawBench leaderboard?

The SeedClawBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.666. The average score across all models is 0.652.

What is the highest SeedClawBench score?

The highest SeedClawBench score is 0.666, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on SeedClawBench?

2 models have been evaluated on the SeedClawBench benchmark, with 0 verified results and 2 self-reported results.

What categories does SeedClawBench cover?

SeedClawBench is categorized under agents, coding, and tool calling. The benchmark evaluates text models.

How recent are the SeedClawBench leaderboard results?

The SeedClawBench leaderboard was last updated in June 2026 and currently includes 2 evaluated models.