AI Leaderboard
The best AI models ranked by performance, price, and speed
296 models
| License | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
1 | Claude Mythos PreviewUNRELEASED Anthropic | 71.4 | 57.3 | 48.6 | 94.6% | — | — | — | $36.11 | Proprietary |
2 | GPT-5.5NEW OpenAI | 63.1 | 53.1 | 43.6 | 93.6% | — | 1.1M | 81c/s | $7.78 | Proprietary |
3 | Anthropic | 62.8 | 51.6 | 42.2 | 94.2% | 1,793 | 1.0M | 56c/s | $7.22 | Proprietary |
4 | GPT-5.5 ProNEW OpenAI | 62.0 | — | 43.8 | — | — | 1.1M | — | $46.67 | Proprietary |
5 | Anthropic | 60.0 | 45.6 | 38.1 | 91.3% | 2,001 | 1.0M | 126c/s | $7.22 | Proprietary |
6 | Kimi K2.6NEW Moonshot AI | 59.5 | 45.6 | 38.9 | 90.5% | 1,216 | 262K | 44c/s | $1.29 | Open Source |
7 | Google | 59.2 | 44.1 | 33.7 | 94.3% | 2,093 | 1.0M | — | $3.89 | Proprietary |
8 | OpenAI | 58.0 | 44.3 | 37.7 | 92.8% | 1,717 | 1.0M | 244c/s | $3.89 | Proprietary |
9 | DeepSeek | 57.8 | 45.0 | 36.8 | 90.1% | 418 | 1.0M | 51c/s | $1.93 | Open Source |
10 | OpenAI | 56.6 | — | 29.8 | 93.2% | — | 400K | — | $37.33 | Proprietary |
11 | OpenAI | 56.2 | 43.9 | 36.1 | — | 1,245 | 400K | 206c/s | $3.11 | Proprietary |
12 | Zhipu AI | 55.4 | 45.1 | 35.1 | 86.2% | 1,493 | 200K | 110c/s | $1.73 | Open Source |
13 | Anthropic | 54.6 | 41.0 | 28.1 | 87.0% | 1,613 | 200K | 242c/s | $7.22 | Proprietary |
14 | ByteDance | 54.6 | 33.3 | 29.2 | 88.9% | — | — | — | — | Proprietary |
15 | OpenAI | 54.4 | 35.7 | 26.3 | 92.4% | 1,523 | 400K | 145c/s | $3.11 | Proprietary |
16 | OpenAI | 53.8 | — | — | 88.1% | 1,140 | — | — | — | Proprietary |
17 | MiniMax | 53.6 | 40.1 | 31.0 | — | 687 | 205K | 122c/s | $0.40 | Open Source |
18 | Grok-4 HeavyUNRELEASED xAI | 53.3 | 25.6 | — | 88.4% | — | — | — | — | Proprietary |
19 | MiniMax | 52.9 | 38.9 | 29.1 | — | 959 | 1.0M | 309c/s | $0.40 | Open Source |
20 | Meta | 52.9 | 32.9 | 25.0 | 89.5% | — | — | — | — | Proprietary |
1-20 of 296
New Models
Announced in the last 15 days
Performance Index
Composite TrueSkill ratings across published benchmarks
TrueSkill conservative rating (μ − 3σ) across all benchmarks tagged in this category. Higher is better. Methodology & full leaderboard →