Seal-0
Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.
Progress Over Time
Interactive timeline showing model performance evolution on Seal-0
State-of-the-art frontier
Open
Proprietary
Seal-0 Leaderboard
6 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Moonshot AI | 1.0T | 262K | $0.60 / $2.50 | ||
| 2 | Moonshot AI | 1.0T | — | — | ||
| 3 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
| 4 | Alibaba Cloud / Qwen Team | 397B | 262K | $0.60 / $3.60 | ||
| 5 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 6 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 |
Notice missing or incorrect data?
FAQ
Common questions about Seal-0
Seal-0 is a benchmark for evaluating agentic search capabilities, testing models' ability to navigate and retrieve information using tools.
The Seal-0 leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, Kimi K2.5 by Moonshot AI leads with a score of 0.574. The average score across all models is 0.489.
The highest Seal-0 score is 0.574, achieved by Kimi K2.5 from Moonshot AI.
6 models have been evaluated on the Seal-0 benchmark, with 0 verified results and 6 self-reported results.
Seal-0 is categorized under reasoning and search. The benchmark evaluates text models.