SWE-Perf

Name: SWE-Perf Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Software Engineering Performance benchmark measuring code optimization capabilities

MiniMax M2.1 from MiniMax currently leads the SWE-Perf leaderboard with a score of 0.031 across 1 evaluated AI models.

About this benchmark

What SWE-Perf measures

SWE-Perf is a text benchmark that evaluates large language models on code tasks. LLM Stats tracks 1 model on this benchmark, with a maximum possible score of 1. Current average across reported models is 0.0, with the leader reaching 0.0.

Compare leaders on the best AI for code leaderboards.

MiniMax M2.1 leads with 3.1%.

Progress Over Time

Interactive timeline showing model performance evolution on SWE-Perf

State-of-the-art frontier

Open

Proprietary

SWE-Perf Leaderboard

1 models

				Context	Cost	License
1	MiniMax M2.1 MiniMax		230B	1.0M	$0.30 / $1.20

Notice missing or incorrect data?

FAQ

Common questions about SWE-Perf.

What is the SWE-Perf benchmark?

Software Engineering Performance benchmark measuring code optimization capabilities

What is the SWE-Perf leaderboard?

The SWE-Perf leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniMax M2.1 by MiniMax leads with a score of 0.031. The average score across all models is 0.031.

What is the highest SWE-Perf score?

The highest SWE-Perf score is 0.031, achieved by MiniMax M2.1 from MiniMax.

How many models are evaluated on SWE-Perf?

1 models have been evaluated on the SWE-Perf benchmark, with 0 verified results and 1 self-reported results.

What categories does SWE-Perf cover?

SWE-Perf is categorized under code. The benchmark evaluates text models.

What is the best open-source model on SWE-Perf?

MiniMax M2.1 by MiniMax is the top-ranked open-source model on SWE-Perf, with a score of 0.031 (rank #1).

Which model offers the best value on SWE-Perf?

Among models scoring within 10% of the leader, MiniMax M2.1 from MiniMax is the cheapest, at $0.30 per million input tokens with a score of 0.031.

How recent are the SWE-Perf leaderboard results?

The SWE-Perf leaderboard was last updated in June 2026 and currently includes 1 evaluated models.

More evaluations to explore

Related benchmarks in the same category

View all code →

SWE-Bench Verified

A verified subset of 500 software engineering problems from real GitHub issues, validated by human annotators for evaluating language models' ability to resolve real-world coding issues by generating patches for Python codebases.

code

100 models

LiveCodeBench

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

code

74 models

HumanEval

A benchmark that measures functional correctness for synthesizing programs from docstrings, consisting of 164 original programming problems assessing language comprehension, algorithms, and simple mathematics

code

66 models

Terminal-Bench 2.0

Terminal-Bench 2.0 is an updated benchmark for testing AI agents' tool use ability to operate a computer via terminal. It evaluates how well models can handle real-world, end-to-end tasks autonomously, including compiling code, training models, setting up servers, system administration, security tasks, data science workflows, and cybersecurity vulnerabilities.

code

47 models

SWE-bench Multilingual

A multilingual benchmark for issue resolving in software engineering that covers Java, TypeScript, JavaScript, Go, Rust, C, and C++. Contains 1,632 high-quality instances carefully annotated from 2,456 candidates by 68 expert annotators, designed to evaluate Large Language Models across diverse software ecosystems beyond Python.

code

30 models

SWE-Bench Pro

SWE-Bench Pro is an advanced version of SWE-Bench that evaluates language models on complex, real-world software engineering tasks requiring extended reasoning and multi-step problem solving.

code

29 models