Best by task

Best for Reasoning: GPT-5.6 Sol58.1 indexHighest reasoning index →
Best for Coding: Claude Fable 595.0%Highest SWE-bench →
Fastest LLM: DeepSeek-V3.2 (Non-thinking)610 c/sHighest P95 output throughput →
Cheapest Frontier: Nemotron 3 Nano (30B A3B)$0.06/1M inLowest price at frontier quality →
Largest Context: Grok-4.1 Fast Non-Reasoning2M tokensBiggest context window →
Best Open-Weight: Kimi K393.5% GPQATop open-weight model →

LLM Leaderboard 2026

Name: LLM Leaderboard 2026 — AI Models with Live Pricing & Benchmarks
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Compare 300+ AI models by the LLM Stats Score — intelligence, speed and price, updated continuously from public benchmarks and live API metrics.

Select 2-4 models to compare

← Scroll →


Claude Mythos PreviewUNRELEASED	56.1	🇺🇸	Closed	-	-	-	–	–	56.6	47.1	46.6	26.1	–	41.4	29.1	21.7	–	–	11.5	94.6%	—	93.9%	—	92.7%	—	86.9%	93.2%	—	—	—	64.7%	—	—	—	—	—	—	—	—	—	77.8%	-	–	-	-	Anthropic
GPT-5.6 Sol	58.0	🇺🇸	Closed	1.1M	$5.00	$30.00	22c/s	–	58.1	36.8	50.1	28.9	–	38.2	36.1	35.2	37.1	32.4	36.7	94.6%	—	—	—	—	—	90.4%	—	83.0%	—	—	—	—	—	58.0%	—	—	89.0%	91.5%	—	—	64.6%	-	7.0s	Feb. 2026	Jul. 2026	OpenAI
Gemini 3.1 Pro	43.6	🇺🇸	Closed	1.0M	$2.50	$15.00	41c/s	–	45.1	43.0	33.0	23.9	–	31.7	24.8	11.9	19.6	15.3	18.2	94.3%	—	80.6%	77.1%	92.6%	—	85.9%	—	80.5%	—	69.2%	51.4%	—	—	—	—	—	—	26.3%	59.0%	33.5%	54.2%	-	18.6s	Jan. 2025	Feb. 2026	Google
Claude Opus 4.7	45.6	🇺🇸	Closed	1M	$5.00	$25.00	2c/s	–	47.1	39.5	39.2	17.3	–	34.5	27.3	10.9	34.2	31.7	23.9	94.2%	—	87.6%	—	91.5%	—	79.3%	91.0%	—	—	77.3%	54.7%	—	—	—	—	—	—	—	—	—	64.3%	-	2.4s	-	Apr. 2026	Anthropic
Claude Opus 4.8	52.6	🇺🇸	Closed	1M	$5.00	$25.00	12c/s	–	52.1	39.8	43.9	26.6	–	40.1	33.6	22.4	32.3	30.7	19.5	93.6%	—	88.6%	—	—	—	84.3%	89.9%	—	87.9%	82.2%	57.9%	—	—	59.9%	—	—	—	—	—	—	69.2%	-	4.0s	-	May 2026	Anthropic
GPT-5.5	49.1	🇺🇸	Closed	1.1M	$5.00	$30.00	15c/s	–	48.3	38.5	41.1	21.9	21.0	36.9	31.5	22.7	28.9	21.1	–	93.6%	—	—	85.0%	—	—	84.4%	—	83.2%	—	75.3%	52.2%	—	—	55.6%	—	—	35.4%	74.0%	—	—	58.6%	-	10.6s	Dec. 2025	Apr. 2026	OpenAI
Kimi K3	55.7	🇨🇳	Open	1.0M	$3.00	$15.00	3c/s	–	54.9	42.0	44.9	34.4	–	39.2	39.2	–	31.5	31.5	–	93.5%	—	—	—	—	—	91.2%	91.3%	81.6%	—	84.2%	56.0%	—	—	73.2%	—	—	—	—	—	37.6%	—	2800	8.3s	-	Jul. 2026	MoonshotAI
GPT-5.2 Pro	43.2	🇺🇸	Closed	-	-	-	–	–	41.6	39.1	–	16.8	–	24.4	–	–	–	–	–	93.2%	100.0%	—	54.2%	—	—	77.9%	—	—	—	—	36.6%	—	—	—	—	—	—	—	—	—	—	-	–	-	Dec. 2025	OpenAI
Grok 4.5	49.5	🇺🇸	Closed	500k	$2.00	$6.00	24c/s	–	48.2	–	40.5	–	–	–	23.0	–	27.3	27.3	–	93.0%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	64.7%	-	5.4s	Feb. 2026	Jul. 2026	xAI
GPT-5.6 Terra	53.3	🇺🇸	Closed	1.1M	$2.50	$15.00	228c/s	–	52.1	33.4	46.0	27.0	–	30.0	34.0	28.9	31.7	28.5	32.4	92.9%	—	—	—	—	—	87.5%	—	80.7%	—	—	—	—	—	53.1%	—	—	84.9%	89.6%	—	—	63.4%	-	2.4s	Feb. 2026	Jul. 2026	OpenAI
GPT-5.4	43.7	🇺🇸	Closed	1M	$2.50	$15.00	81c/s	–	44.0	36.9	34.2	17.8	24.4	30.1	29.0	18.3	27.6	24.2	25.4	92.8%	—	—	73.3%	—	—	82.7%	—	81.2%	—	67.2%	39.8%	—	—	54.6%	—	—	47.6%	—	—	—	57.7%	-	1.7s	-	Mar. 2026	OpenAI
GPT-5.2	42.1	🇺🇸	Closed	400k	$1.75	$14.00	187c/s	–	41.5	38.3	24.6	15.3	25.3	28.4	21.7	–	–	–	31.5	92.4%	100.0%	80.0%	52.9%	89.6%	—	65.8%	82.1%	79.5%	86.3%	60.6%	34.5%	—	—	46.3%	—	—	40.3%	—	—	—	—	-	71.4s	Aug. 2025	Dec. 2025	OpenAI
Qwen3.7 Max	46.7	🇨🇳	Closed	1M	$1.25	$3.75	5c/s	–	47.3	44.0	38.9	–	–	26.0	28.7	–	37.5	38.4	45.1	92.4%	—	80.4%	—	90.3%	—	—	—	—	—	76.4%	41.4%	—	—	—	—	—	—	—	53.5%	—	60.6%	-	7.9s	-	May 2026	Qwen
GPT-5.6 Luna	46.8	🇺🇸	Closed	1.1M	$1.00	$6.00	364c/s	–	46.6	29.4	39.3	18.4	–	23.5	28.8	23.5	26.7	27.9	28.7	92.3%	—	—	—	—	—	83.3%	—	78.4%	—	—	—	—	—	53.4%	—	—	78.6%	41.3%	—	—	62.7%	-	4.2s	Feb. 2026	Jul. 2026	OpenAI
Gemini 3 Pro	39.1	🇺🇸	Closed	-	-	-	–	–	39.1	37.5	24.5	–	–	27.1	13.8	11.9	–	–	34.8	91.9%	100.0%	76.2%	31.1%	91.8%	—	—	81.4%	81.0%	72.7%	—	45.8%	72.1%	—	—	—	—	—	26.3%	—	—	—	-	–	Jan. 2025	Nov. 2025	Google
Claude Opus 4.6	46.4	🇺🇸	Closed	1M	$5.00	$25.00	55c/s	–	46.7	40.4	35.1	24.6	31.9	27.9	26.6	26.6	32.1	29.6	6.3	91.3%	99.8%	80.8%	68.8%	91.1%	—	84.0%	77.4%	77.3%	—	62.7%	53.1%	—	72.7%	—	—	—	—	76.0%	—	—	—	-	4.7s	-	Feb. 2026	Anthropic
GLM-5.2	47.1	🇨🇳	Open	1.0M	$0.95	$3.00	9c/s	–	46.3	41.9	39.0	–	–	28.4	26.1	–	–	–	–	91.2%	—	—	—	—	—	—	—	—	—	76.8%	54.7%	—	—	48.2%	—	—	—	—	—	—	62.1%	753	6.0s	-	Jun. 2026	ZAI
Kimi K2.6	44.7	🇨🇳	Open	262.1k	$0.75	$3.50	5c/s	–	44.9	37.9	35.4	26.3	–	30.4	23.5	–	24.5	19.9	–	90.5%	—	80.2%	—	—	—	86.3%	86.7%	80.1%	—	—	36.4%	—	—	50.0%	—	—	—	—	52.2%	27.9%	58.6%	1000	17.1s	-	Apr. 2026	MoonshotAI
Gemini 3 Flash	37.7	🇺🇸	Closed	1M	$0.50	$3.00	256c/s	–	38.5	37.3	22.8	–	–	26.2	18.4	9.2	19.0	12.0	31.3	90.4%	99.7%	78.0%	33.6%	91.8%	—	—	80.3%	81.2%	69.1%	57.4%	43.5%	68.7%	—	49.4%	—	—	—	22.1%	—	—	—	-	7.6s	Jan. 2025	Dec. 2025	Google
Hy3	43.8	🇨🇳	Open	-	-	-	–	–	43.8	35.7	35.7	24.8	–	–	25.6	26.8	–	–	–	90.4%	—	78.0%	—	—	—	84.2%	—	—	—	79.1%	—	—	—	48.5%	—	—	—	—	—	25.6%	57.9%	295	–	-	Jul. 2026	Tencent
Qwen3.6 Plus	40.4	🇨🇳	Closed	1M	$0.50	$3.00	126c/s	–	41.3	40.1	29.3	15.4	–	29.7	22.2	29.3	33.0	34.7	41.6	90.4%	—	78.8%	—	89.5%	86.0%	—	81.5%	78.8%	68.2%	74.1%	28.8%	—	—	39.8%	—	—	—	—	—	—	56.6%	-	2.0s	-	Mar. 2026	Qwen
Qwen3.7-Plus	43.8	🇨🇳	Closed	1M	$0.32	$1.28	130c/s	–	43.8	40.6	32.7	–	–	34.4	25.2	28.8	31.4	33.0	41.7	90.3%	—	77.7%	—	89.0%	—	—	85.9%	79.0%	79.0%	73.2%	34.7%	—	—	—	—	—	—	—	51.3%	—	57.6%	-	3.3s	-	May 2026	Qwen
DeepSeek-V4-Pro-Max	44.0	🇨🇳	Open	1.0M	$1.60	$3.20	5c/s	–	44.2	41.2	33.8	19.0	–	23.0	25.4	9.7	31.1	31.1	31.3	90.1%	—	80.6%	—	—	—	83.4%	—	—	—	73.6%	48.2%	57.9%	—	51.8%	—	—	—	—	—	—	55.4%	1600	4.4s	-	Apr. 2026	DeepSeek
Claude Sonnet 4.6	38.4	🇺🇸	Closed	200k	$3.00	$15.00	4c/s	–	39.3	34.8	27.6	14.1	27.1	25.8	22.9	13.2	32.0	27.9	15.9	89.9%	—	79.6%	58.3%	89.3%	—	74.7%	—	75.6%	—	61.3%	49.0%	—	72.5%	—	—	—	—	—	—	—	—	-	2.7s	-	Feb. 2026	Anthropic
Muse Spark	42.5	🇺🇸	Closed	-	-	-	–	–	41.4	40.0	23.8	0.8	14.4	33.3	16.8	–	17.9	17.9	36.5	89.5%	—	77.4%	42.5%	—	—	—	86.4%	80.4%	84.1%	—	58.4%	—	—	—	—	—	—	—	—	—	52.4%	-	–	-	Apr. 2026	Meta
Seed 2.0 Pro	40.3	🇨🇳	Closed	256k	$0.50	$3.00	1c/s	–	39.8	32.9	21.0	16.3	–	–	–	–	–	–	–	88.9%	98.3%	76.5%	—	—	—	77.3%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	-	3.6s	Jan. 2024	Feb. 2026	Bytedance
Grok-4 HeavyUNRELEASED	41.4	🇺🇸	Closed	-	-	-	–	–	39.7	41.7	18.4	–	–	24.9	–	–	–	–	–	88.4%	100.0%	—	—	—	—	—	—	—	—	—	50.7%	—	—	—	—	—	—	—	—	—	—	-	–	Dec. 2024	-	xAI
Qwen3.5-397B-A17B	39.6	🇨🇳	Open	-	-	-	–	–	39.3	37.6	23.1	18.9	17.3	19.6	18.2	28.2	31.8	31.8	38.7	88.4%	—	76.4%	—	88.5%	—	69.0%	—	—	—	—	28.7%	—	—	38.3%	—	—	—	—	—	—	—	397	–	-	Feb. 2026	Qwen
DeepSeek-V4-Flash-Max	39.7	🇨🇳	Open	1.0M	$0.10	$0.20	3c/s	–	40.5	40.4	29.4	13.6	–	21.2	19.5	3.9	28.3	28.3	28.1	88.1%	—	79.0%	—	—	—	73.2%	—	—	—	69.0%	45.1%	34.1%	—	47.8%	—	—	—	—	—	—	52.6%	284	4.6s	-	Apr. 2026	DeepSeek
GPT-5.1	37.1	🇺🇸	Closed	400k	$1.25	$10.00	136c/s	–	36.8	30.0	20.2	8.1	21.6	21.7	19.0	–	–	–	31.0	88.1%	94.0%	76.3%	—	—	85.4%	—	—	—	—	—	—	—	—	—	—	—	26.7%	—	—	—	—	-	3.5s	Sep. 2024	Nov. 2025	OpenAI

Showing 1–30 of 335 models

LLM Leaderboard highlights

The LLM Leaderboard ranks 300+ AI models by intelligence, output speed, latency and per-token pricing, aggregated into the LLM Stats Score. Updated continuously from provider APIs and verified benchmarks. See the LLM Stats Score methodology for how rankings are computed.

Best on GPQA Diamond: Claude Mythos Preview (94.6%)
Best on AIME 2025: GPT-5.2 Pro (100.0%)
Best on SWE-Bench Verified: Claude Fable 5 (95.0%)
Highest throughput: DeepSeek-V3.2 (Non-thinking) (610 tok/s)
Lowest latency: Mistral Small 4 (422.00 s)
Cheapest input: Nemotron 3 Nano (30B A3B) ($0.06 / 1M tok)
Largest context window: Grok-4.1 Fast Non-Reasoning (2.0M tokens)

FAQ

Common questions about the llm leaderboard

What is the best LLM right now?

Based on coding-arena performance — the most discriminating signal at the frontier — the top model currently leads. For knowledge-heavy reasoning (GPQA Diamond), Claude Mythos Preview scores highest. Choose by axis rather than a single ranking — see the highlights above for per-metric leaders.

How does the LLM Leaderboard rank models?

Models are sorted by coding-arena score (when available), then by GPQA Diamond. Each row aggregates verified benchmark results, provider-reported pricing, and live performance metrics (output throughput and time-to-first-token) sampled across the major API providers. See the LLM Stats Score methodology for the full weighting and refresh cadence.

How many models are tracked?

This leaderboard tracks 335 canonical models across every major lab and provider. New releases typically appear within hours.

Where does pricing data come from?

Per-model input/output pricing is pulled from each provider's public API price list and verified against billing samples from the LLM Stats proxy. When a model is hosted by multiple providers, the cheapest available rate is shown by default.

How is performance measured?

Output throughput (tokens/second) and time-to-first-token are measured by routing standardized prompts through each provider's API and averaging over a 7-day rolling window. Numbers update hourly. Per-model splits live on each model detail page.

How often does the data update?

Pricing and model metadata revalidate every hour. Live performance metrics update on a 7-day rolling average. Benchmark scores update when a new verified result is published or a new evaluation lands on LLM Stats.