AMO Bench

Name: AMO Bench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on AMO Bench

State-of-the-art frontier

Open

Proprietary

AMO Bench Leaderboard

1 models

				Context	Cost	License
1	MAI-Code-1-Flash Microsoft		—	—	—

Notice missing or incorrect data?

About this benchmark

What is AMO Bench?

AMO Bench is an olympiad-level mathematics benchmark that evaluates advanced mathematical problem-solving and multi-step reasoning on competition-style problems.

AMO Bench is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.4.

Compare leaders on the best AI for math and best AI for reasoning leaderboards.

Current leaders

MAI-Code-1-Flash from Microsoft currently leads the AMO Bench leaderboard with a score of 0.400 across 1 evaluated AI models.

MAI-Code-1-FlashMicrosoft40.0%

FAQ

Common questions about the AMO Bench benchmark and leaderboard.

What is the AMO Bench benchmark?

AMO Bench is an olympiad-level mathematics benchmark that evaluates advanced mathematical problem-solving and multi-step reasoning on competition-style problems.

What is the AMO Bench leaderboard?

The AMO Bench leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MAI-Code-1-Flash by Microsoft leads with a score of 0.400. The average score across all models is 0.400.

What is the highest AMO Bench score?

The highest AMO Bench score is 0.400, achieved by MAI-Code-1-Flash from Microsoft.

How many models are evaluated on AMO Bench?

1 models have been evaluated on the AMO Bench benchmark, with 0 verified results and 1 self-reported results.

What categories does AMO Bench cover?

AMO Bench is categorized under math and reasoning. The benchmark evaluates text models.

How recent are the AMO Bench leaderboard results?

The AMO Bench leaderboard was last updated in July 2026 and currently includes 1 evaluated models.