MBPP pass@1
MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. Each problem consists of a task description, code solution, and 3 automated test cases. This variant uses pass@1 evaluation metric measuring the percentage of problems solved correctly on the first attempt.
Progress Over Time
Interactive timeline showing model performance evolution on MBPP pass@1
State-of-the-art frontier
Open
Proprietary
MBPP pass@1 Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Mistral AI | 8B | 128K | $0.10 / $0.10 |
Notice missing or incorrect data?
FAQ
Common questions about MBPP pass@1
MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. Each problem consists of a task description, code solution, and 3 automated test cases. This variant uses pass@1 evaluation metric measuring the percentage of problems solved correctly on the first attempt.
The MBPP pass@1 paper is available at https://arxiv.org/abs/2108.07732. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MBPP pass@1 leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Ministral 8B Instruct by Mistral AI leads with a score of 0.700. The average score across all models is 0.700.
The highest MBPP pass@1 score is 0.700, achieved by Ministral 8B Instruct from Mistral AI.
1 models have been evaluated on the MBPP pass@1 benchmark, with 0 verified results and 1 self-reported results.
MBPP pass@1 is categorized under general and reasoning. The benchmark evaluates text models.