Seal-0

Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.

Progress Over Time

Interactive timeline showing model performance evolution on Seal-0

State-of-the-art frontier
Open
Proprietary

Seal-0 Leaderboard

6 models
ContextCostLicense
1
Moonshot AI
Moonshot AI
1.0T262K$0.60 / $2.50
21.0T
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B262K$0.60 / $3.60
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
Notice missing or incorrect data?

FAQ

Common questions about Seal-0

Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.
The Seal-0 leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, Kimi K2.5 by Moonshot AI leads with a score of 0.574. The average score across all models is 0.489.
The highest Seal-0 score is 0.574, achieved by Kimi K2.5 from Moonshot AI.
6 models have been evaluated on the Seal-0 benchmark, with 0 verified results and 6 self-reported results.
Seal-0 is categorized under reasoning and search. The benchmark evaluates text models.