Legal Agent Benchmark

Name: Legal Agent Benchmark Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper Implementation

Progress Over Time

Interactive timeline showing model performance evolution on Legal Agent Benchmark

State-of-the-art frontier

Open

Proprietary

Legal Agent Benchmark Leaderboard

13 models

			Context	Cost
1	Claude Fable 5 Anthropic	—	1.0M	$10.00 / $50.00
2	Claude Opus 5New Anthropic	—	1.0M	$5.00 / $25.00
3	Claude Opus 4.7 Anthropic	—	1.0M	$5.00 / $25.00
4	Claude Sonnet 5 Anthropic	—	1.0M	$3.00 / $15.00
5	Claude Sonnet 4.6 Anthropic	—	200K	$3.00 / $15.00
6	Claude Opus 4.6 Anthropic	—	1.0M	$5.00 / $25.00
7	GPT-5.5 OpenAI	—	1.1M	$5.00 / $30.00
8	Gemini 3.5 Flash Google	—	1.0M	$1.50 / $9.00
9	GPT-5.4 OpenAI	—	1.0M	$2.50 / $15.00
10	Gemini 3.1 Pro Google	—	1.0M	$2.50 / $15.00
10	Gemini 3 Flash Google	—	1.0M	$0.50 / $3.00
10	GPT-5.4 mini OpenAI	—	400K	$0.75 / $4.50
10	Gemini 3.1 Flash-Lite Google	—	1.0M	$0.25 / $1.50

Notice missing or incorrect data?

About this benchmark

What is Legal Agent Benchmark?

The Legal Agent Benchmark (LAB) is Harvey's open-source benchmark for evaluating AI agents on complex, long-horizon legal work. Tasks are scored under an all-pass standard against expert-curated rubrics, where a task passes only if every required rubric criterion (facts, conclusions, citations, structure, and analytical moves) passes.

Legal Agent Benchmark is a text benchmark evaluating models on reasoning, legal, and agents tasks. LLM Stats tracks 13 models on this benchmark, scored on a 0–1 scale. The current average is 0.0, with the leader at 0.1.

Compare leaders on the best AI for reasoning, best AI for legal and best AI for agents leaderboards.

Current leaders

Claude Fable 5 from Anthropic currently leads the Legal Agent Benchmark leaderboard with a score of 0.133 across 13 evaluated AI models.

Claude Fable 5Anthropic13.3%

Claude Opus 5Anthropic11.7%

Claude Opus 4.7Anthropic7.1%

FAQ

Common questions about the Legal Agent Benchmark benchmark and leaderboard.

What is the Legal Agent Benchmark benchmark?

What is the Legal Agent Benchmark leaderboard?

The Legal Agent Benchmark leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Claude Fable 5 by Anthropic leads with a score of 0.133. The average score across all models is 0.039.

What is the highest Legal Agent Benchmark score?

The highest Legal Agent Benchmark score is 0.133, achieved by Claude Fable 5 from Anthropic.

How many models are evaluated on Legal Agent Benchmark?

13 models have been evaluated on the Legal Agent Benchmark benchmark, with 0 verified results and 3 self-reported results.

Where can I find the Legal Agent Benchmark paper?

The Legal Agent Benchmark paper is available at https://www.harvey.ai/blog/legal-agent-benchmark-initial-results. The paper details the methodology, dataset construction, and evaluation criteria.

Where can I find the Legal Agent Benchmark dataset?

The Legal Agent Benchmark dataset is available at https://www.harvey.ai/blog/introducing-harveys-legal-agent-benchmark.

What categories does Legal Agent Benchmark cover?

Legal Agent Benchmark is categorized under reasoning, legal, and agents. The benchmark evaluates text models.

Which model offers the best value on Legal Agent Benchmark?

Among models scoring within 10% of the leader, Claude Fable 5 from Anthropic is the cheapest, at $10.00 per million input tokens with a score of 0.133.

How recent are the Legal Agent Benchmark leaderboard results?

The Legal Agent Benchmark leaderboard was last updated in July 2026 and currently includes 13 evaluated models.