MLE-Bench Lite
MLE-Bench Lite evaluates AI agents on machine learning engineering tasks, testing their ability to build, train, and optimize ML models for Kaggle-style competitions in a lightweight evaluation format.
Progress Over Time
Interactive timeline showing model performance evolution on MLE-Bench Lite
State-of-the-art frontier
Open
Proprietary
MLE-Bench Lite Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | MiniMax | — | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MLE-Bench Lite
MLE-Bench Lite evaluates AI agents on machine learning engineering tasks, testing their ability to build, train, and optimize ML models for Kaggle-style competitions in a lightweight evaluation format.
The MLE-Bench Lite leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniMax M2.7 by MiniMax leads with a score of 0.666. The average score across all models is 0.666.
The highest MLE-Bench Lite score is 0.666, achieved by MiniMax M2.7 from MiniMax.
1 models have been evaluated on the MLE-Bench Lite benchmark, with 0 verified results and 1 self-reported results.
MLE-Bench Lite is categorized under agents and coding. The benchmark evaluates text models.