MBPP pass@1 Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on MBPP pass@1

State-of-the-art frontier

Open

Proprietary

MBPP pass@1 Leaderboard

1 models

				Context	Cost	License
1	Ministral 8B Instruct Mistral AI		8B	128K	$0.10 / $0.10

FAQ

Common questions about MBPP pass@1

MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. Each problem consists of a task description, code solution, and 3 automated test cases. This variant uses pass@1 evaluation metric measuring the percentage of problems solved correctly on the first attempt.

The MBPP pass@1 paper is available at https://arxiv.org/abs/2108.07732. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The MBPP pass@1 leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Ministral 8B Instruct by Mistral AI leads with a score of 0.700. The average score across all models is 0.700.

The highest MBPP pass@1 score is 0.700, achieved by Ministral 8B Instruct from Mistral AI.

1 models have been evaluated on the MBPP pass@1 benchmark, with 0 verified results and 1 self-reported results.

MBPP pass@1 is categorized under general and reasoning. The benchmark evaluates text models.

MBPP pass@1

Progress Over Time

MBPP pass@1 Leaderboard

FAQ

What is the MBPP pass@1 benchmark?

Where can I find the MBPP pass@1 paper?

What is the MBPP pass@1 leaderboard?

What is the highest MBPP pass@1 score?

How many models are evaluated on MBPP pass@1?

What categories does MBPP pass@1 cover?