Open LLM Leaderboard

Name: Open LLM Leaderboard - LLM Stats
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Ranking the best open LLMs by performance, price, and speed

Select 2-4 models to compare

← Scroll →


GLM-5.1	40.9	🇨🇳	Open	200k	$1.40	$4.40	3c/s	1,755	40.6	34.5	33.7	17.7	–	27.1	21.5	–	25.1	21.5	–	86.2%	—	—	—	—	—	79.3%	—	—	—	71.8%	52.3%	—	—	40.7%	—	—	—	—	—	—	58.4%	754	2.3s	-	Apr. 2026	ZAI
GLM-5	37.8	🇨🇳	Open	200k	$1.00	$3.20	–	1,586	38.0	–	27.1	15.5	–	–	19.4	–	–	–	–	—	—	77.8%	—	—	—	75.9%	—	—	—	67.8%	—	—	—	—	—	—	—	—	—	—	—	744	–	-	Feb. 2026	ZAI
Kimi K2.6	44.9	🇨🇳	Open	262.1k	$0.75	$3.50	53c/s	1,551	45.0	38.0	35.6	26.7	–	30.9	23.5	–	25.0	20.4	–	90.5%	—	80.2%	—	—	—	86.3%	86.7%	80.1%	—	—	36.4%	—	—	50.0%	—	—	—	—	52.2%	27.9%	58.6%	1000	1.5s	-	Apr. 2026	MoonshotAI
GLM-5.2	47.6	🇨🇳	Open	1.0M	$0.95	$3.00	13c/s	1,501	46.7	42.0	39.9	–	–	28.7	26.0	–	–	–	–	91.2%	—	—	—	—	—	—	—	—	—	76.8%	54.7%	—	—	48.2%	—	—	—	—	—	—	62.1%	753	6.0s	-	Jun. 2026	ZAI
Kimi K2.5	39.8	🇨🇳	Open	-	-	-	–	1,462	39.7	37.2	25.0	22.4	–	29.1	7.5	27.5	30.6	30.6	35.4	87.6%	96.1%	76.8%	—	—	—	74.9%	77.5%	78.5%	—	—	50.2%	—	—	—	—	—	—	—	48.7%	—	50.7%	1000	–	-	Jan. 2026	MoonshotAI
DeepSeek-V4-Pro-Max	44.2	🇨🇳	Open	1.0M	$1.60	$3.20	19c/s	1,342	44.4	41.3	33.9	19.3	–	23.3	25.4	9.7	31.5	31.5	31.3	90.1%	—	80.6%	—	—	—	83.4%	—	—	—	73.6%	48.2%	57.9%	—	51.8%	—	—	—	—	—	—	55.4%	1600	5.3s	-	Apr. 2026	DeepSeek
Gemma 4 31B	33.5	🇺🇸	Open	262.1k	$0.13	$0.38	5c/s	1,301	33.9	30.7	–	–	–	20.4	13.7	18.3	19.8	19.8	27.3	84.3%	—	—	—	88.4%	—	—	—	76.9%	—	—	26.5%	—	—	—	—	—	—	66.4%	—	—	—	30.7	20.8s	Jan. 2025	Apr. 2026	Google
Qwen3.5-397B-A17B	39.6	🇨🇳	Open	-	-	-	–	1,296	39.4	37.6	23.1	19.2	17.3	19.8	18.2	28.3	32.1	32.1	38.7	88.4%	—	76.4%	—	88.5%	—	69.0%	—	—	—	—	28.7%	—	—	38.3%	—	—	—	—	—	—	—	397	–	-	Feb. 2026	Qwen
Gemma 4 26B-A4B	30.3	🇺🇸	Open	262.1k	$0.13	$0.40	33c/s	1,248	30.2	26.1	–	–	–	15.1	12.6	18.3	22.5	22.5	24.5	82.3%	—	—	—	86.3%	—	—	—	73.8%	—	—	17.2%	—	—	—	—	—	—	44.1%	—	—	—	25.2	2.5s	Jan. 2025	Apr. 2026	Google
MiniMax M2.7	36.5	🇨🇳	Open	204.8k	$0.30	$1.20	23c/s	1,234	36.5	–	30.0	–	–	–	16.8	–	13.9	18.8	–	—	—	—	—	—	—	—	—	—	—	—	—	—	—	46.3%	—	—	—	—	—	—	56.2%	-	2.2s	-	Mar. 2026	MiniMax
GLM-4.6	30.2	🇨🇳	Open	-	-	-	–	1,144	30.1	25.7	15.5	4.8	–	6.2	–	–	–	–	–	81.0%	93.9%	68.0%	—	—	—	45.1%	—	—	—	—	17.2%	—	—	—	40.5%	—	—	—	—	—	—	357	–	-	Sep. 2025	ZAI
DeepSeek-V4-Flash-Max	39.8	🇨🇳	Open	1.0M	$0.10	$0.20	4c/s	1,141	40.6	40.5	29.5	13.8	–	21.5	19.5	3.9	28.6	28.6	28.1	88.1%	—	79.0%	—	—	—	73.2%	—	—	—	69.0%	45.1%	34.1%	—	47.8%	—	—	—	—	—	—	52.6%	284	15.7s	-	Apr. 2026	DeepSeek
Qwen3.6-27B	36.7	🇨🇳	Open	262.1k	$0.60	$3.60	221c/s	1,120	37.2	35.4	27.1	–	–	26.0	15.3	18.8	29.4	29.4	36.1	87.8%	—	77.2%	—	—	82.9%	—	78.4%	75.8%	—	—	24.0%	—	—	—	—	—	—	—	—	—	53.5%	27.8	968ms	-	Apr. 2026	Qwen
GLM-4.7	35.3	🇨🇳	Open	-	-	-	–	1,064	35.2	35.1	17.9	11.9	–	20.6	8.0	–	25.3	25.3	25.3	85.7%	95.7%	73.8%	—	—	—	52.0%	—	—	—	—	42.8%	—	—	—	33.3%	—	—	—	—	—	—	358	–	-	Dec. 2025	ZAI
MiniMax M2.5	37.6	🇨🇳	Open	1M	$0.30	$1.20	68c/s	1,039	37.8	–	28.1	16.0	–	–	–	–	-0.6	–	–	—	—	80.2%	—	—	—	76.3%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	55.4%	230	1.7s	-	Feb. 2026	MiniMax
LongCat-Flash-Chat	20.8	🇨🇳	Open	-	-	-	–	1,039	20.8	19.8	13.0	–	12.2	–	9.9	–	26.8	26.8	26.8	73.2%	61.3%	60.4%	—	—	—	—	—	—	—	—	—	—	—	—	39.5%	—	—	—	—	—	—	560	–	-	Aug. 2025	Meituan
MiniMax M3	43.4	🇨🇳	Open	1M	$0.30	$1.20	7c/s	1,014	43.4	25.8	36.1	19.9	–	26.5	23.6	–	29.2	26.3	28.9	—	—	80.5%	—	—	—	83.5%	—	78.1%	—	74.2%	—	—	—	—	—	—	—	—	—	27.7%	59.0%	-	6.5s	-	Jun. 2026	MiniMax
DeepSeek-V3.2 (Non-thinking)	–	🇨🇳	Open	131.1k	$0.28	$0.42	44c/s	1,011	–	–	–	–	–	–	–	–	–	–	–	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	685	1.7s	-	Dec. 2025	DeepSeek
MiniMax M2	27.5	🇨🇳	Open	1M	$0.30	$1.20	38c/s	1,005	27.8	20.5	19.3	3.2	13.6	2.3	11.1	–	21.7	21.7	21.7	78.0%	78.0%	69.4%	—	—	—	44.0%	—	—	—	—	12.5%	—	—	—	46.3%	—	—	—	36.0%	—	—	230	1.7s	-	Oct. 2025	MiniMax
Kimi K2.7 Code	38.9	🇨🇳	Open	262.1k	$0.74	$3.50	15c/s	971	32.6	19.2	31.1	–	–	–	27.1	–	–	–	–	—	—	—	—	—	—	—	—	—	—	76.0%	—	—	—	—	—	—	—	—	—	—	—	1000	3.3s	-	Jun. 2026	MoonshotAI
Kimi K2-Thinking-0905	36.9	🇨🇳	Open	-	-	-	–	941	37.1	38.6	21.7	15.1	-5.3	25.6	–	–	19.1	19.0	31.9	84.5%	100.0%	71.3%	—	—	—	60.2%	—	—	—	—	51.0%	—	—	—	47.1%	—	—	—	44.8%	—	—	1000	–	-	Sep. 2025	MoonshotAI
Mistral Large 3 (675B Instruct 2512)	9.8	🇫🇷	Open	262.1k	$0.50	$1.50	8c/s	886	10.0	18.7	2.0	–	–	–	–	–	–	–	–	43.9%	—	—	—	85.5%	—	—	—	—	—	—	—	23.8%	—	—	—	—	—	—	—	—	—	675	8.2s	-	Dec. 2025	Mistral
MiniMax M2.1	33.0	🇨🇳	Open	1M	$0.30	$1.20	12c/s	858	33.2	28.1	22.8	11.8	13.7	10.0	16.2	14.1	32.8	32.8	32.8	81.0%	81.0%	67.0%	—	—	—	62.0%	—	—	—	—	22.0%	—	—	43.5%	47.9%	—	—	—	39.0%	—	—	230	2.8s	-	Dec. 2025	MiniMax
DeepSeek-V3.2-Speciale	35.2	🇨🇳	Open	-	-	-	–	832	33.7	36.3	18.5	–	–	15.0	10.7	–	–	–	–	—	96.0%	73.1%	—	—	—	—	—	—	—	—	30.6%	—	—	35.2%	—	—	—	—	—	—	—	685	–	-	Dec. 2025	DeepSeek
GPT OSS 120B High	26.7	🇺🇸	Open	-	-	-	40c/s	811	26.4	24.5	–	–	–	–	2.6	–	18.9	18.9	18.9	80.9%	92.5%	—	—	83.8%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	116.8	11.7s	-	Aug. 2025	OpenAI
Qwen3.5-27B	34.8	🇨🇳	Open	262.1k	$0.30	$2.40	126c/s	798	35.4	35.8	17.2	15.4	13.6	25.5	12.1	25.2	31.9	31.9	35.3	85.5%	—	72.4%	—	85.9%	82.3%	61.0%	79.5%	75.0%	70.3%	—	48.5%	—	—	—	—	—	—	—	—	—	—	27	3.6s	-	Feb. 2026	Qwen
MiMo-V2-Flash	31.6	🇨🇳	Open	-	-	-	–	793	31.7	30.3	17.8	10.2	–	10.4	5.8	15.9	26.4	26.4	26.4	83.7%	94.1%	73.4%	—	—	—	58.3%	—	—	—	—	22.1%	—	—	—	30.5%	—	—	—	—	—	—	309	–	-	Dec. 2025	Xiaomi
Qwen3.5-122B-A10B	35.9	🇨🇳	Open	-	-	-	–	793	36.3	37.0	20.7	16.4	14.5	26.9	12.2	25.9	30.5	30.5	38.0	86.6%	—	72.0%	—	86.7%	83.9%	63.8%	77.2%	76.9%	70.4%	—	47.5%	—	—	—	—	—	—	—	—	—	—	122	–	-	Feb. 2026	Qwen
LongCat-Flash-Thinking	29.7	🇨🇳	Open	-	-	-	–	791	29.8	28.8	14.8	–	18.1	8.1	17.1	–	27.6	22.7	22.7	81.5%	90.6%	59.4%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	560	–	-	Sep. 2025	Meituan
Step-3.5-Flash	37.7	🇨🇳	Open	65.5k	$0.10	$0.40	60c/s	764	37.7	33.7	20.7	13.4	–	–	12.9	–	–	–	–	—	97.3%	74.4%	—	—	—	69.0%	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	196	1.1s	-	Feb. 2026	StepFun

Showing 1–30 of 192 models

Open LLM Leaderboard highlights

Independent ranking of open-weight large language models — Llama, Qwen, GLM, DeepSeek, Mistral, Kimi and more — by coding-arena score, GPQA Diamond, throughput, latency, and per-token pricing. Updated continuously from provider APIs and verified benchmarks. See the LLM Stats Score methodology for how rankings are computed.

Best for coding (Arena): GLM-5.1 (17.6 arena score)
Best on GPQA Diamond: GLM-5.2 (91.2%)
Best on AIME 2025: Kimi K2-Thinking-0905 (100.0%)
Best on SWE-Bench Verified: DeepSeek-V4-Pro-Max (80.6%)
Highest throughput: Qwen3.6-27B (221 tok/s)
Lowest latency: GPT OSS 120B (529.80 s)
Cheapest input: Nemotron 3 Nano (30B A3B) ($0.06 / 1M tok)
Largest context window: GLM-5.2 (1.0M tokens)

FAQ

Common questions about the open llm leaderboard

What is the best open-source LLM right now?

Based on coding-arena performance — the most discriminating signal at the frontier — GLM-5.1 currently leads. For knowledge-heavy reasoning (GPQA Diamond), GLM-5.2 scores highest. Choose by axis rather than a single ranking — see the highlights above for per-metric leaders.

How does the Open LLM Leaderboard rank models?

Models are sorted by coding-arena score (when available), then by GPQA Diamond. Each row aggregates verified benchmark results, provider-reported pricing, and live performance metrics (output throughput and time-to-first-token) sampled across the major API providers. See the LLM Stats Score methodology for the full weighting and refresh cadence.

How many models are tracked?

This leaderboard tracks 191 canonical models across every major lab and provider. New releases typically appear within hours.

Where does pricing data come from?

Per-model input/output pricing is pulled from each provider's public API price list and verified against billing samples from the LLM Stats proxy. When a model is hosted by multiple providers, the cheapest available rate is shown by default.

How is performance measured?

Output throughput (tokens/second) and time-to-first-token are measured by routing standardized prompts through each provider's API and averaging over a 7-day rolling window. Numbers update hourly. Per-model splits live on each model detail page.

How often does the data update?

Pricing and model metadata revalidate every hour. Live performance metrics update on a 7-day rolling average. Benchmark scores update when a new verified result is published or a new evaluation lands on LLM Stats.