CritPT

Name: CritPT Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on CritPT

State-of-the-art frontier

Open

Proprietary

CritPT Leaderboard

4 models

			Context	Cost
1	GLM-5.2 Zhipu AI	753B	1.0M	$0.95 / $3.00
2	Qwen3.7 Max Alibaba Cloud / Qwen Team	—	1.0M	$1.25 / $3.75
3	Qwen3.7-Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.32 / $1.28
4	Nemotron 3 Ultra (550B A55B) NVIDIA	550B	—	—

Notice missing or incorrect data?

About this benchmark

What is CritPT?

CritPT is a challenging reasoning benchmark reported by Qwen for evaluating frontier mathematical and critical problem-solving capability.

CritPT is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 4 models on this benchmark, scored on a 0–1 scale. The current average is 0.1, with the leader at 0.2.

Compare leaders on the best AI for math and best AI for reasoning leaderboards.

Current leaders

GLM-5.2 from Zhipu AI currently leads the CritPT leaderboard with a score of 0.167 across 4 evaluated AI models.

GLM-5.2Zhipu AI16.7%

Qwen3.7 MaxAlibaba Cloud / Qwen Team11.4%

Qwen3.7-PlusAlibaba Cloud / Qwen Team6.0%

FAQ

Common questions about the CritPT benchmark and leaderboard.

What is the CritPT benchmark?

CritPT is a challenging reasoning benchmark reported by Qwen for evaluating frontier mathematical and critical problem-solving capability.

What is the CritPT leaderboard?

The CritPT leaderboard ranks 4 AI models based on their performance on this benchmark. Currently, GLM-5.2 by Zhipu AI leads with a score of 0.167. The average score across all models is 0.093.

What is the highest CritPT score?

The highest CritPT score is 0.167, achieved by GLM-5.2 from Zhipu AI.

How many models are evaluated on CritPT?

4 models have been evaluated on the CritPT benchmark, with 0 verified results and 4 self-reported results.

What categories does CritPT cover?

CritPT is categorized under math and reasoning. The benchmark evaluates text models.

What is the best open-source model on CritPT?

GLM-5.2 by Zhipu AI is the top-ranked open-source model on CritPT, with a score of 0.167 (rank #1).

Which model offers the best value on CritPT?

Among models scoring within 10% of the leader, GLM-5.2 from Zhipu AI is the cheapest, at $0.95 per million input tokens with a score of 0.167.

How recent are the CritPT leaderboard results?

The CritPT leaderboard was last updated in July 2026 and currently includes 4 evaluated models.