Agents' Last Exam
Progress Over Time
Interactive timeline showing model performance evolution on Agents' Last Exam
Agents' Last Exam Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Seed 2.1 ProNew ByteDance | — | — | — |
What is Agents' Last Exam?
Agents' Last Exam is a challenging benchmark for AI agents on hard, long-horizon tasks that test sustained reasoning, planning, and tool use, reported with and without tool access.
Agents' Last Exam is a text benchmark evaluating models on reasoning, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.4.
Compare leaders on the best AI for reasoning, best AI for agents and best AI for tool calling leaderboards.
Current leaders
Seed 2.1 Pro from ByteDance currently leads the Agents' Last Exam leaderboard with a score of 0.414 across 1 evaluated AI models.
FAQ
Common questions about the Agents' Last Exam benchmark and leaderboard.