PostTrainBench

Name: PostTrainBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on PostTrainBench

State-of-the-art frontier

Open

Proprietary

PostTrainBench Leaderboard

5 models

			Context	Cost
1	MiniMax M3 MiniMax	—	1.0M	$0.30 / $1.20
2	Kimi K3New Moonshot AI	2.8T	1.0M	$3.00 / $15.00
3	GLM-5.2 Zhipu AI	753B	1.0M	$0.95 / $3.00
4	Seed 2.1 Turbo ByteDance	—	—	—
5	Seed 2.1 Pro ByteDance	—	—	—

Notice missing or incorrect data?

About this benchmark

What is PostTrainBench?

PostTrainBench evaluates a model's ability to autonomously post-train base models. Given pretrain-only base models, the agent must complete the full pipeline of data synthesis, training, evaluation, and iteration within a time budget, scored across downstream benchmarks such as AIME2025, BFCL, GPQA Main, GSM8K, and HumanEval.

PostTrainBench is a text benchmark evaluating models on reasoning, agents, code, and systems tasks. LLM Stats tracks 5 models on this benchmark, scored on a 0–1 scale. The current average is 0.3, with the leader at 0.4.

Compare leaders on the best AI for reasoning, best AI for agents, best AI for code and best AI for systems leaderboards.

Current leaders

MiniMax M3 from MiniMax currently leads the PostTrainBench leaderboard with a score of 0.371 across 5 evaluated AI models.

MiniMax M3MiniMax37.1%

Kimi K3Moonshot AI36.6%

GLM-5.2Zhipu AI34.3%

FAQ

Common questions about the PostTrainBench benchmark and leaderboard.

What is the PostTrainBench benchmark?

What is the PostTrainBench leaderboard?

The PostTrainBench leaderboard ranks 5 AI models based on their performance on this benchmark. Currently, MiniMax M3 by MiniMax leads with a score of 0.371. The average score across all models is 0.286.

What is the highest PostTrainBench score?

The highest PostTrainBench score is 0.371, achieved by MiniMax M3 from MiniMax.

How many models are evaluated on PostTrainBench?

5 models have been evaluated on the PostTrainBench benchmark, with 0 verified results and 5 self-reported results.

What categories does PostTrainBench cover?

PostTrainBench is categorized under reasoning, agents, code, and systems. The benchmark evaluates text models.

What is the best open-source model on PostTrainBench?

MiniMax M3 by MiniMax is the top-ranked open-source model on PostTrainBench, with a score of 0.371 (rank #1).

Which model offers the best value on PostTrainBench?

Among models scoring within 10% of the leader, MiniMax M3 from MiniMax is the cheapest, at $0.30 per million input tokens with a score of 0.371.

How recent are the PostTrainBench leaderboard results?

The PostTrainBench leaderboard was last updated in July 2026 and currently includes 5 evaluated models.