What is the LiveBench leaderboard?

The LiveBench leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, o3-mini by OpenAI leads with a score of 0.846. The average score across all models is 0.632.

What is the highest LiveBench score?

The highest LiveBench score is 0.846, achieved by o3-mini from OpenAI.

How many models are evaluated on LiveBench?

13 models have been evaluated on the LiveBench benchmark, with 0 verified results and 13 self-reported results.

Where can I find the LiveBench paper?

The LiveBench paper is available at https://arxiv.org/abs/2406.19314. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does LiveBench cover?

LiveBench is categorized under general, math, and reasoning. The benchmark evaluates text models.

All benchmarks

LiveBench

LiveBench is a challenging, contamination-limited LLM benchmark that addresses test set contamination by releasing new questions monthly based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. It comprises tasks across math, coding, reasoning, language, instruction following, and data analysis with verifiable, objective ground-truth answers.

o3-mini from OpenAI currently leads the LiveBench leaderboard with a score of 0.846 across 13 evaluated AI models.

Paper