LLM Leaderboard — Compare 300+ Top AI Models by Intelligence, Speed & Price

Independent ranking of GPT, Claude, Gemini, Llama, DeepSeek and 300+ AI models — composite LLM Stats Score, updated continuously from public benchmarks and live API metrics.

Current leadersLive
Full
296
License
1
Anthropic
Anthropic
70.171.357.348.8$36.11
Proprietary
2
OpenAI
OpenAI
64.062.953.144.08001.1M99c/s$7.78
Proprietary
3
61.162.751.642.31,8271.0M77c/s$7.22
Proprietary
4
OpenAI
OpenAI
61.057.944.337.81,7171.0M250c/s$3.89
Proprietary
5
61.056.529.9400K$37.33
Proprietary
6
Moonshot AI
Moonshot AI
58.759.545.638.71,191262K42c/s$1.29
Open Source
7
57.859.044.133.82,0931.0M$3.89
Proprietary
8
57.559.945.638.12,0031.0M98c/s$7.22
Proprietary
9
ByteDance
ByteDance
56.654.533.329.3
Proprietary
10
OpenAI
OpenAI
56.154.235.726.41,520400K174c/s$3.11
Proprietary
11
56.050.233.424.11,579
Proprietary
12
54.749.731.525.61,6891.0M295c/s$0.78
Proprietary
13
54.649.631.3778400K331c/s$2.22
Proprietary
14
OpenAI
OpenAI
53.348.031.81,225400K424c/s$2.22
Proprietary
15
53.144.31,095400K99c/s$2.22
Proprietary
16
53.053.032.925.0
Proprietary
17
52.052.91,140
Proprietary
18
51.957.745.036.85001.0M58c/s$1.93
Open Source
19
51.846.230.81,004400K192c/s$2.22
Proprietary
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
51.152.443.331.91.0M$0.78
Proprietary
1-20 of 296
Recent

New Models

Announced in the last 15 days.

Index

Performance Index

Composite TrueSkill ratings across published benchmarks.

FAQ

Quick answers for choosing, comparing and interpreting today's leading AI models.

Which AI model ranks #1 on the LLM Leaderboard?

On the LLM Stats Leaderboard, Claude Mythos Preview currently leads on GPQA Diamond (94.6% gpqa), the most discriminating reasoning benchmark at the frontier. This AI leaderboard ranks models by the LLM Stats Score, which aggregates GPQA, SWE-Bench Verified, coding-arena performance and pricing into one comparable AI ranking. Rankings refresh continuously as new benchmark results land.

What is the best AI model right now?

"Best" depends on what you're optimizing for. For frontier reasoning, Claude Mythos Preview leads on GPQA. For coding agents, Gemini 3.1 Pro is the strongest in head-to-head coding-arena play. For low cost at frontier quality, Kimi K2.6 is the cheapest in the top 10 at $0.95 /M tok. The leaders summary above the table names the current winner per axis.

What are the best LLMs in 2026?

The leading LLMs in 2026 are Claude Mythos Preview, Gemini 3.1 Pro, and the frontier models from OpenAI (GPT-5 family), Anthropic (Claude Opus and Sonnet), Google (Gemini 3 Pro), xAI (Grok 4), DeepSeek (V3 / R1) and Z.AI (GLM-5). Open-weights leaders include Llama, Qwen and DeepSeek. The full ranking is in the leaderboard table above.

What is the cheapest AI model in the top 10?

Kimi K2.6 is the cheapest model in the top 10 by GPQA Diamond, at $0.95 /M tok input. The Cheapest filter on the leaderboard restricts to verified, currently-available frontier models — pricing is pulled from each provider's public price list and cross-checked against billing samples through the LLM Stats proxy.

Which AI model has the largest context window?

Llama 4 Scout currently exposes the largest practical context window at 10.0M tokens tokens. Larger context lets you keep more documents, conversation history and tool traces in a single request. For long-document workloads, also consult the per-model "effective context" notes on each model detail page — providers vary in how well they actually use the upper end of their advertised windows.

What is the fastest LLM by output speed?

Mercury 2 currently has the highest output throughput at 1693 tok/s. Output speed is measured by routing standardized prompts through each provider's API and averaging tokens-per-second over a 7-day rolling window. Fast inference matters most for streaming chat UIs and agentic loops; for batched async workloads, blended price per 1M tokens is usually the better axis.

Which is the best open-source AI model?

Kimi K2.6 currently leads among open-weights LLMs (90.5% gpqa). The open-weights ecosystem is dominated by Llama, Qwen, DeepSeek, Mistral, GLM and Gemma. The dedicated Open LLM Leaderboard filters this catalog to models with publicly released weights so you can self-host or fine-tune.

How is the LLM Stats Score calculated?

The LLM Stats Score is a composite that blends verified benchmark results (GPQA Diamond, SWE-Bench Verified, coding-arena), live performance metrics (output throughput, time-to-first-token) and per-token pricing into one comparable number. Pricing and metadata revalidate hourly; live performance updates on a 7-day rolling average. For the full weighting and refresh cadence see the LLM Stats Score methodology. 296 canonical models are tracked across every major lab and inference provider.