Open LLM Leaderboard

Ranking the best open LLMs by performance, price, and speed

Select 2-4 models to compare
← Scroll →
🇨🇳
Open
200k$1.00$3.201,581
52.2
37.3
26.5
26.7
77.8%
75.9%
67.8%
744-
Feb. 2026
ZAI
🇨🇳
Open
200k$1.40$4.401,539
55.1
47.2
45.1
30.1
39.5
30.6
86.2%
79.3%
71.8%
52.3%
40.7%
58.4%
754-
Apr. 2026
ZAI
🇨🇳
Open
262.1k$0.60$3.0071c/s1,462
50.4
47.9
32.5
30.4
35.9
14.6
38.3
47.7
47.7
48.5
87.6%
96.1%
76.8%
74.9%
77.5%
78.5%
50.2%
48.7%
50.7%
100066.0s-
Jan. 2026
MoonshotAI
🇨🇳
Open
262.1k$0.95$4.0019c/s1,254
59.1
50.7
45.6
38.0
38.0
33.4
90.5%
80.2%
86.3%
86.7%
80.1%
36.4%
50.0%
52.2%
27.9%
58.6%
100035.7s-
Apr. 2026
MoonshotAI
🇨🇳
Open
262.1k$0.60$3.60185c/s1,207
49.3
46.7
31.0
25.8
28.7
29.0
22.8
38.8
55.4
55.4
54.3
88.4%
76.4%
88.5%
69.0%
28.7%
38.3%
39711.4s-
Feb. 2026
Qwen
🇨🇳
Open
131.1k$0.55$2.1954c/s1,144
38.3
34.5
20.5
7.4
13.5
81.0%
93.9%
68.0%
45.1%
17.2%
40.5%
357743ms-
Sep. 2025
ZAI
🇺🇸
Open
262.1k$0.13$0.40107c/s1,141
36.7
33.1
21.1
18.4
15.9
32.6
32.6
31.0
82.3%
86.3%
73.8%
17.2%
25.21.5sJan. 2025
Apr. 2026
Google
🇨🇳
Open
262.1k$0.30$1.49106c/s1,114
26.2
28.7
19.4
23.7
4.7
23.8
32.5
34.1
32.7
74.7%
62.1%
68.1%
62.0%
51.9%
66.7%
2364.0s-
Sep. 2025
Qwen
🇨🇳
Open
1.0M$1.74$3.4870c/s1,080
57.8
53.8
45.0
33.5
33.5
35.2
20.0
45.2
45.7
48.6
90.1%
80.6%
83.4%
73.6%
48.2%
57.9%
51.8%
55.4%
160095.0s-
Apr. 2026
DeepSeek
🇨🇳
Open
204.8k$0.60$2.2044c/s1,066
44.3
44.1
23.2
16.8
29.7
12.5
36.6
36.5
36.3
85.7%
95.7%
73.8%
52.0%
42.8%
33.3%
3588.2s-
Dec. 2025
ZAI
🇺🇸
Open
262.1k$0.14$0.4070c/s979
45.2
39.9
27.8
20.1
29.8
42.4
42.4
36.7
84.3%
88.4%
76.9%
26.5%
30.75.9sJan. 2025
Apr. 2026
Google
🇨🇳
Open
1M$0.30$1.20490c/s955
53.2
38.9
27.0
-1.6
80.2%
76.3%
55.4%
2302.3s-
Feb. 2026
MiniMax
🇨🇳
Open
---941
45.7
47.3
26.6
19.9
-6.3
36.8
26.4
26.3
39.4
84.5%
100.0%
71.3%
60.2%
51.0%
47.1%
44.8%
1000-
Sep. 2025
MoonshotAI
🇨🇳
Open
1M$0.30$1.20639c/s908
41.6
36.7
27.6
19.1
18.2
18.6
21.1
20.1
51.5
51.5
51.0
81.0%
81.0%
67.0%
62.0%
22.0%
43.5%
47.9%
39.0%
2302.0s-
Dec. 2025
MiniMax
🇨🇳
Open
1.0M$0.14$0.28122c/s842
52.3
51.5
38.8
23.9
31.0
28.3
7.9
36.5
36.5
44.7
88.1%
79.0%
73.2%
69.0%
45.1%
34.1%
47.8%
52.6%
28418.5s-
Apr. 2026
DeepSeek
🇨🇳
Open
1M$0.30$1.20161c/s834
34.3
25.9
23.9
5.3
19.1
6.7
14.5
29.9
29.9
29.6
78.0%
78.0%
69.4%
44.0%
12.5%
46.3%
36.0%
2301.1s-
Oct. 2025
MiniMax
🇨🇳
Open
---832
43.0
46.2
23.5
25.4
13.1
96.0%
73.1%
30.6%
35.2%
685-
Dec. 2025
DeepSeek
🇨🇳
Open
204.8k$0.30$1.20161c/s831
53.7
40.1
24.3
31.2
31.5
46.3%
56.2%
-5.2s-
Mar. 2026
MiniMax
🇨🇳
Open
---793
38.8
38.1
23.6
15.9
19.3
9.9
22.1
38.6
38.6
38.3
83.7%
94.1%
73.4%
58.3%
22.1%
30.5%
309-
Dec. 2025
Xiaomi
🇨🇳
Open
---791
36.3
34.2
19.4
23.8
16.6
21.6
36.6
32.5
32.2
81.5%
90.6%
59.4%
560-
Sep. 2025
Meituan
🇨🇳
Open
128k$0.30$1.20171c/s772
23.9
22.9
16.2
17.8
14.5
35.0
35.0
34.5
73.2%
61.3%
60.4%
39.5%
5605.9s-
Aug. 2025
Meituan
🇨🇳
Open
128k$0.07$0.409c/s759
32.0
29.3
9.0
3.9
8.7
10.2
75.2%
91.6%
59.2%
42.8%
14.4%
3024.5s-
Jan. 2026
ZAI
🇨🇳
Open
---750
36.2
33.7
22.3
1.9
16.5
39.1
39.1
38.8
79.9%
89.3%
67.8%
40.1%
19.8%
97.1%
37.7%
685-
Sep. 2025
DeepSeek
🇨🇳
Open
---744
34.4
34.5
21.3
-4.2
25.2
8.2
25.8
42.0
37.9
37.7
79.1%
64.2%
26.4%
14.4%
37.5%
79.7%
41.7%
355-
Jul. 2025
ZAI
🇨🇳
Open
256k$0.10$0.40272c/s712
23.4
20.1
12.3
18.7
15.4
22.3
22.3
21.9
66.8%
63.2%
54.4%
33.8%
68.56.6s-
Feb. 2026
Meituan
🇨🇳
Open
131.1k$0.28$0.42136c/s7056851.2s-
Dec. 2025
DeepSeek
🇨🇳
Open
262.1k$0.40$3.2084c/s698
43.5
44.4
26.6
22.2
24.3
31.3
14.1
32.4
49.9
49.9
48.0
86.6%
72.0%
86.7%
83.9%
63.8%
77.2%
76.9%
70.4%
47.5%
12214.5s-
Feb. 2026
Qwen
🇫🇷
Open
262.1k$0.50$1.50174c/s644
9.1
19.9
0.1
43.9%
85.5%
23.8%
675391ms-
Dec. 2025
Mistral
🇨🇳
Open
65.5k$0.10$0.40201c/s595
49.9
47.1
28.9
22.4
19.1
97.3%
74.4%
69.0%
1969.0s-
Feb. 2026
StepFun
🇨🇳
Open
---585480-
Jan. 2025
Qwen
Showing 130 of 179 models

Open LLM Leaderboard highlights

Independent ranking of open-weight large language models — Llama, Qwen, GLM, DeepSeek, Mistral, Kimi and more — by coding-arena score, GPQA Diamond, throughput, latency, and per-token pricing. Updated continuously from provider APIs and verified benchmarks. See the LLM Stats Score methodology for how rankings are computed.

FAQ

Common questions about the open llm leaderboard

What is the best LLM right now?

Based on coding-arena performance — the most discriminating signal at the frontier — Gemini 3.1 Pro currently leads. For knowledge-heavy reasoning (GPQA Diamond), Claude Mythos Preview scores highest. Choose by axis rather than a single ranking — see the highlights above for per-metric leaders.

How does the Open LLM Leaderboard rank models?

Models are sorted by coding-arena score (when available), then by GPQA Diamond. Each row aggregates verified benchmark results, provider-reported pricing, and live performance metrics (output throughput and time-to-first-token) sampled across the major API providers. See the LLM Stats Score methodology for the full weighting and refresh cadence.

How many models are tracked?

This leaderboard tracks 298 canonical models across every major lab and provider. New releases typically appear within hours.

Where does pricing data come from?

Per-model input/output pricing is pulled from each provider's public API price list and verified against billing samples from the LLM Stats proxy. When a model is hosted by multiple providers, the cheapest available rate is shown by default.

How is performance measured?

Output throughput (tokens/second) and time-to-first-token are measured by routing standardized prompts through each provider's API and averaging over a 7-day rolling window. Numbers update hourly. Per-model splits live on each model detail page.

How often does the data update?

Pricing and model metadata revalidate every hour. Live performance metrics update on a 7-day rolling average. Benchmark scores update when a new verified result is published or a new evaluation lands on LLM Stats.