Open LLM Leaderboard

Ranking the best open LLMs by performance, price, and speed

Select 2-4 models to compare
← Scroll →
🇨🇳
Open
200k$1.40$4.40229c/s1,754
54.2
46.8
43.0
29.7
39.0
28.9
86.2%
79.3%
71.8%
52.3%
40.7%
58.4%
75416.0s-
Apr. 2026
ZAI
🇨🇳
Open
200k$1.00$3.20336c/s1,595
51.5
36.1
26.3
25.6
77.8%
75.9%
67.8%
7447.4s-
Feb. 2026
ZAI
🇨🇳
Open
262.1k$0.95$4.0046c/s1,562
58.1
49.7
43.7
37.3
37.1
31.3
90.5%
80.2%
86.3%
86.7%
80.1%
36.4%
50.0%
52.2%
27.9%
58.6%
100060.7s-
Apr. 2026
MoonshotAI
🇨🇳
Open
---1,462
49.8
47.5
31.4
30.0
35.3
13.7
38.2
46.8
46.8
48.0
87.6%
96.1%
76.8%
74.9%
77.5%
78.5%
50.2%
48.7%
50.7%
1000-
Jan. 2026
MoonshotAI
🇨🇳
Open
262.1k$0.60$3.60105c/s1,294
48.9
46.0
30.0
25.7
28.7
27.8
21.9
38.7
53.1
53.2
52.6
88.4%
76.4%
88.5%
69.0%
28.7%
38.3%
39719.2s-
Feb. 2026
Qwen
🇨🇳
Open
1.0M$1.74$3.4837c/s1,280
57.0
52.9
43.5
32.9
33.5
33.1
19.8
42.0
42.5
47.8
90.1%
80.6%
83.4%
73.6%
48.2%
57.9%
51.8%
55.4%
1600149.0s-
Apr. 2026
DeepSeek
🇺🇸
Open
262.1k$0.14$0.40216c/s1,258
44.8
39.5
27.1
19.4
29.9
41.9
41.9
36.3
84.3%
88.4%
76.9%
26.5%
30.7860msJan. 2025
Apr. 2026
Google
🇺🇸
Open
262.1k$0.13$0.40118c/s1,241
35.5
32.8
20.5
17.8
15.8
32.3
32.3
30.6
82.3%
86.3%
73.8%
17.2%
25.22.7sJan. 2025
Apr. 2026
Google
🇨🇳
Open
204.8k$0.30$1.20170c/s1,156
53.1
38.9
23.1
27.8
28.4
46.3%
56.2%
-4.0s-
Mar. 2026
MiniMax
🇨🇳
Open
---1,144
37.7
34.2
19.9
6.8
12.8
81.0%
93.9%
68.0%
45.1%
17.2%
40.5%
357-
Sep. 2025
ZAI
🇨🇳
Open
1.0M$0.14$0.2872c/s1,143
51.9
51.2
37.6
23.7
31.0
26.9
7.6
35.0
35.1
44.2
88.1%
79.0%
73.2%
69.0%
45.1%
34.1%
47.8%
52.6%
28413.8s-
Apr. 2026
DeepSeek
🇨🇳
Open
262.1k$0.60$3.60161c/s1,118
45.7
42.8
34.3
30.2
23.4
29.1
46.1
46.1
45.9
87.8%
77.2%
82.9%
78.4%
75.8%
24.0%
53.5%
27.819.1s-
Apr. 2026
Qwen
🇨🇳
Open
---1,064
43.8
43.9
22.4
16.9
29.8
11.9
36.2
36.2
35.9
85.7%
95.7%
73.8%
52.0%
42.8%
33.3%
358-
Dec. 2025
ZAI
🇨🇳
Open
1M$0.30$1.20399c/s1,056
52.5
38.2
26.8
-1.0
80.2%
76.3%
55.4%
2301.7s-
Feb. 2026
MiniMax
🇨🇳
Open
128k$0.30$1.2011c/s1,039
23.4
22.7
15.7
17.6
14.0
34.7
34.7
34.2
73.2%
61.3%
60.4%
39.5%
5603.8s-
Aug. 2025
Meituan
🇨🇳
Open
1M$0.30$1.20428c/s971
33.6
25.6
23.3
4.8
19.0
6.1
13.9
29.6
29.5
29.3
78.0%
78.0%
69.4%
44.0%
12.5%
46.3%
36.0%
2301.3s-
Oct. 2025
MiniMax
🇨🇳
Open
131.1k$0.28$0.42288c/s9696852.0s-
Dec. 2025
DeepSeek
🇨🇳
Open
---941
45.2
47.0
25.9
19.8
-6.3
36.5
26.1
26.0
39.1
84.5%
100.0%
71.3%
60.2%
51.0%
47.1%
44.8%
1000-
Sep. 2025
MoonshotAI
🇫🇷
Open
262.1k$0.50$1.50226c/s867
10.2
19.7
-0.2
43.9%
85.5%
23.8%
6751.7s-
Dec. 2025
Mistral
🇨🇳
Open
1M$0.30$1.20301c/s856
41.3
36.2
26.9
19.1
18.2
17.9
20.3
19.9
49.9
49.9
49.7
81.0%
81.0%
67.0%
62.0%
22.0%
43.5%
47.9%
39.0%
2301.3s-
Dec. 2025
MiniMax
🇨🇳
Open
---832
43.2
45.8
22.6
24.5
12.6
96.0%
73.1%
30.6%
35.2%
685-
Dec. 2025
DeepSeek
🇨🇳
Open
---793
38.4
37.9
22.8
16.0
18.5
9.3
21.9
38.2
38.1
37.9
83.7%
94.1%
73.4%
58.3%
22.1%
30.5%
309-
Dec. 2025
Xiaomi
🇨🇳
Open
---791
35.9
34.0
18.9
23.7
16.1
21.2
36.3
32.1
31.9
81.5%
90.6%
59.4%
560-
Sep. 2025
Meituan
🇨🇳
Open
262.1k$0.30$2.40103c/s775
42.0
42.4
21.4
20.4
22.2
29.4
12.7
30.9
45.6
45.7
43.9
85.5%
72.4%
85.9%
82.3%
61.0%
79.5%
75.0%
70.3%
48.5%
2736.0s-
Feb. 2026
Qwen
🇨🇳
Open
---759
31.4
29.1
8.4
3.4
7.4
9.6
75.2%
91.6%
59.2%
42.8%
14.4%
30-
Jan. 2026
ZAI
🇨🇳
Open
256k$0.10$0.4064c/s756
23.0
19.9
11.7
18.6
14.9
22.0
22.1
21.7
66.8%
63.2%
54.4%
33.8%
68.53.5s-
Feb. 2026
Meituan
🇨🇳
Open
---750
35.7
33.4
21.7
1.5
15.8
38.7
38.7
38.4
79.9%
89.3%
67.8%
40.1%
19.8%
97.1%
37.7%
685-
Sep. 2025
DeepSeek
🇨🇳
Open
262.1k$0.40$3.20128c/s749
43.0
44.0
25.8
22.1
24.2
30.7
13.4
32.3
48.7
48.8
47.4
86.6%
72.0%
86.7%
83.9%
63.8%
77.2%
76.9%
70.4%
47.5%
12233.9s-
Feb. 2026
Qwen
🇨🇳
Open
---744
34.0
34.3
20.7
-4.6
25.1
8.2
25.6
41.6
37.5
37.3
79.1%
64.2%
26.4%
14.4%
37.5%
79.7%
41.7%
355-
Jul. 2025
ZAI
🇨🇳
Open
65.5k$0.10$0.40363c/s731
49.2
46.6
27.9
22.4
18.3
97.3%
74.4%
69.0%
1968.8s-
Feb. 2026
StepFun
Showing 130 of 180 models

Open LLM Leaderboard highlights

Independent ranking of open-weight large language models — Llama, Qwen, GLM, DeepSeek, Mistral, Kimi and more — by coding-arena score, GPQA Diamond, throughput, latency, and per-token pricing. Updated continuously from provider APIs and verified benchmarks. See the LLM Stats Score methodology for how rankings are computed.

FAQ

Common questions about the open llm leaderboard

What is the best LLM right now?

Based on coding-arena performance — the most discriminating signal at the frontier — Claude Opus 4.6 currently leads. For knowledge-heavy reasoning (GPQA Diamond), Claude Mythos Preview scores highest. Choose by axis rather than a single ranking — see the highlights above for per-metric leaders.

How does the Open LLM Leaderboard rank models?

Models are sorted by coding-arena score (when available), then by GPQA Diamond. Each row aggregates verified benchmark results, provider-reported pricing, and live performance metrics (output throughput and time-to-first-token) sampled across the major API providers. See the LLM Stats Score methodology for the full weighting and refresh cadence.

How many models are tracked?

This leaderboard tracks 302 canonical models across every major lab and provider. New releases typically appear within hours.

Where does pricing data come from?

Per-model input/output pricing is pulled from each provider's public API price list and verified against billing samples from the LLM Stats proxy. When a model is hosted by multiple providers, the cheapest available rate is shown by default.

How is performance measured?

Output throughput (tokens/second) and time-to-first-token are measured by routing standardized prompts through each provider's API and averaging over a 7-day rolling window. Numbers update hourly. Per-model splits live on each model detail page.

How often does the data update?

Pricing and model metadata revalidate every hour. Live performance metrics update on a 7-day rolling average. Benchmark scores update when a new verified result is published or a new evaluation lands on LLM Stats.