MCP Atlas
MCP Atlas is a benchmark for evaluating AI models on scaled tool use capabilities, measuring how well models can coordinate and utilize multiple tools across complex multi-step tasks.
Progress Over Time
Interactive timeline showing model performance evolution on MCP Atlas
State-of-the-art frontier
Open
Proprietary
MCP Atlas Leaderboard
11 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Qwen3.6 PlusNew Alibaba Cloud / Qwen Team | — | — | — | ||
| 2 | Google | — | 1.0M | $2.50 / $15.00 | ||
| 3 | Zhipu AI | 744B | 200K | $1.00 / $3.20 | ||
| 4 | OpenAI | — | 1.0M | $2.50 / $15.00 | ||
| 5 | Anthropic | — | 200K | $5.00 / $25.00 | ||
| 6 | Anthropic | — | 200K | $3.00 / $15.00 | ||
| 7 | OpenAI | — | 400K | $1.75 / $14.00 | ||
| 8 | Anthropic | — | 1.0M | $5.00 / $25.00 | ||
| 9 | OpenAI | — | 400K | $0.75 / $4.50 | ||
| 10 | Google | — | 1.0M | $0.50 / $3.00 | ||
| 11 | OpenAI | — | 400K | $0.20 / $1.25 |
Notice missing or incorrect data?
FAQ
Common questions about MCP Atlas
MCP Atlas is a benchmark for evaluating AI models on scaled tool use capabilities, measuring how well models can coordinate and utilize multiple tools across complex multi-step tasks.
The MCP Atlas leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.741. The average score across all models is 0.630.
The highest MCP Atlas score is 0.741, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
11 models have been evaluated on the MCP Atlas benchmark, with 0 verified results and 11 self-reported results.
MCP Atlas is categorized under agents, code, reasoning, and tool calling. The benchmark evaluates text models.