Benchmarks/reasoning/Seal-0

Seal-0

Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.

Progress Over Time

Interactive timeline showing model performance evolution on Seal-0

State-of-the-art frontier

Open

Proprietary

Seal-0 Leaderboard

6 models

			Context	Cost
1	Kimi K2.5 Moonshot AI	1.0T	262K	$0.60 / $2.50
2	Kimi K2-Thinking-0905 Moonshot AI	1.0T	—	—
3	Qwen3.5-27B Alibaba Cloud / Qwen Team	27B	—	—
4	Qwen3.5-397B-A17B Alibaba Cloud / Qwen Team	397B	262K	$0.60 / $3.60
5	Qwen3.5-122B-A10B Alibaba Cloud / Qwen Team	122B	262K	$0.40 / $3.20
6	Qwen3.5-35B-A3B Alibaba Cloud / Qwen Team	35B	262K	$0.25 / $2.00

Notice missing or incorrect data?

FAQ

Common questions about Seal-0

Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.

The Seal-0 leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, Kimi K2.5 by Moonshot AI leads with a score of 0.574. The average score across all models is 0.489.

The highest Seal-0 score is 0.574, achieved by Kimi K2.5 from Moonshot AI.

6 models have been evaluated on the Seal-0 benchmark, with 0 verified results and 6 self-reported results.

Seal-0 is categorized under reasoning and search. The benchmark evaluates text models.

Seal-0

Progress Over Time

Seal-0 Leaderboard

FAQ

What is the Seal-0 benchmark?

What is the Seal-0 leaderboard?

What is the highest Seal-0 score?

How many models are evaluated on Seal-0?

What categories does Seal-0 cover?