MLS-Bench Lite

Name: MLS-Bench Lite Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on MLS-Bench Lite

State-of-the-art frontier

Open

Proprietary

MLS-Bench Lite Leaderboard

2 models

				Context	Cost	License
1	Kimi K3 Moonshot AI		2.8T	1.0M	$3.00 / $15.00
2	Kimi K2.7 Code Moonshot AI		1.0T	262K	$0.74 / $3.50

Notice missing or incorrect data?

About this benchmark

What is MLS-Bench Lite?

MLS-Bench Lite is the official 30-task subset of MLS-Bench for evaluating whether AI systems can invent generalizable and scalable machine learning methods across LLM pretraining and post-training, robotics, world models, computer vision, reinforcement learning, optimization, ML systems, and AI for Science.

MLS-Bench Lite is a text benchmark evaluating models on reasoning, agents, and code tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.5.

Compare leaders on the best AI for reasoning, best AI for agents and best AI for code leaderboards.

Current leaders

Kimi K3 from Moonshot AI currently leads the MLS-Bench Lite leaderboard with a score of 0.483 across 2 evaluated AI models.

Kimi K3Moonshot AI48.3%

Kimi K2.7 CodeMoonshot AI35.1%

FAQ

Common questions about the MLS-Bench Lite benchmark and leaderboard.

What is the MLS-Bench Lite benchmark?

What is the MLS-Bench Lite leaderboard?

The MLS-Bench Lite leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Kimi K3 by Moonshot AI leads with a score of 0.483. The average score across all models is 0.417.

What is the highest MLS-Bench Lite score?

The highest MLS-Bench Lite score is 0.483, achieved by Kimi K3 from Moonshot AI.

How many models are evaluated on MLS-Bench Lite?

2 models have been evaluated on the MLS-Bench Lite benchmark, with 0 verified results and 2 self-reported results.

What categories does MLS-Bench Lite cover?

MLS-Bench Lite is categorized under reasoning, agents, and code. The benchmark evaluates text models.

What is the best open-source model on MLS-Bench Lite?

Kimi K3 by Moonshot AI is the top-ranked open-source model on MLS-Bench Lite, with a score of 0.483 (rank #1).

Which model offers the best value on MLS-Bench Lite?

Among models scoring within 10% of the leader, Kimi K3 from Moonshot AI is the cheapest, at $3.00 per million input tokens with a score of 0.483.

How recent are the MLS-Bench Lite leaderboard results?

The MLS-Bench Lite leaderboard was last updated in July 2026 and currently includes 2 evaluated models.