Benchmarks/agents/MCP Atlas

MCP Atlas

MCP Atlas is a benchmark for evaluating AI models on scaled tool use capabilities, measuring how well models can coordinate and utilize multiple tools across complex multi-step tasks.

Progress Over Time

Interactive timeline showing model performance evolution on MCP Atlas

State-of-the-art frontier
Open
Proprietary

MCP Atlas Leaderboard

11 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
21.0M$2.50 / $15.00
3
Zhipu AI
Zhipu AI
744B200K$1.00 / $3.20
4
OpenAI
OpenAI
1.0M$2.50 / $15.00
5200K$5.00 / $25.00
6200K$3.00 / $15.00
7
OpenAI
OpenAI
400K$1.75 / $14.00
81.0M$5.00 / $25.00
9400K$0.75 / $4.50
101.0M$0.50 / $3.00
11400K$0.20 / $1.25
Notice missing or incorrect data?

FAQ

Common questions about MCP Atlas

MCP Atlas is a benchmark for evaluating AI models on scaled tool use capabilities, measuring how well models can coordinate and utilize multiple tools across complex multi-step tasks.
The MCP Atlas leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.741. The average score across all models is 0.630.
The highest MCP Atlas score is 0.741, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
11 models have been evaluated on the MCP Atlas benchmark, with 0 verified results and 11 self-reported results.
MCP Atlas is categorized under agents, code, reasoning, and tool calling. The benchmark evaluates text models.