ExploitBench

Name: ExploitBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on ExploitBench

State-of-the-art frontier

Open

Proprietary

ExploitBench Leaderboard

4 models

			Context	Cost
1	Claude Fable 5 Anthropic	—	1.0M	$10.00 / $50.00
2	GPT-5.6 Sol OpenAI	—	1.1M	$5.00 / $30.00
3	GPT-5.6 Terra OpenAI	—	1.1M	$2.50 / $15.00
4	GPT-5.6 Luna OpenAI	—	1.1M	$1.00 / $6.00

Notice missing or incorrect data?

About this benchmark

What is ExploitBench?

ExploitBench is a cybersecurity benchmark that evaluates a model's ability to discover and exploit software vulnerabilities, reported as the fraction of challenges where the model captures the target (Cap%).

ExploitBench is a text benchmark evaluating models on safety, agents, and code tasks. LLM Stats tracks 4 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.8.

Compare leaders on the best AI for safety, best AI for agents and best AI for code leaderboards.

Current leaders

Claude Fable 5 from Anthropic currently leads the ExploitBench leaderboard with a score of 0.780 across 4 evaluated AI models.

Claude Fable 5Anthropic78.0%

GPT-5.6 SolOpenAI73.5%

GPT-5.6 TerraOpenAI52.9%

FAQ

Common questions about the ExploitBench benchmark and leaderboard.

What is the ExploitBench benchmark?

What is the ExploitBench leaderboard?

The ExploitBench leaderboard ranks 4 AI models based on their performance on this benchmark. Currently, Claude Fable 5 by Anthropic leads with a score of 0.780. The average score across all models is 0.594.

What is the highest ExploitBench score?

The highest ExploitBench score is 0.780, achieved by Claude Fable 5 from Anthropic.

How many models are evaluated on ExploitBench?

4 models have been evaluated on the ExploitBench benchmark, with 0 verified results and 4 self-reported results.

What categories does ExploitBench cover?

ExploitBench is categorized under safety, agents, and code. The benchmark evaluates text models.

Which model offers the best value on ExploitBench?

Among models scoring within 10% of the leader, GPT-5.6 Sol from OpenAI is the cheapest, at $5.00 per million input tokens with a score of 0.735.

How recent are the ExploitBench leaderboard results?

The ExploitBench leaderboard was last updated in July 2026 and currently includes 4 evaluated models.