LLM Leaderboard

Analyze and compare API models across benchmarks, pricing, and capabilities.

Updated daily with latest models
Tracking official benchmarks & pricing data from leading organizations
Organization 1
Organization 2
Organization 3
Organization 4
Organization 5
Organization 6
Organization 7
Organization 8
Notice missing or incorrect data?Let us know on GitHub

LLM Rankings

Best models and API providers in each category

Developer Platform

One API. Every model.

Test in our playground. Deploy with our unified API. Access 100+ models through a single OpenAI-compatible endpoint.

OpenAIAnthropicGoogleMetaMistralDeepSeekxAIQwen
OpenAIAnthropicGoogleMetaMistralDeepSeekxAIQwen
Free PlaygroundBrowse All Models
99.9% uptime

Context Window

Maximum input context length for each model

While tokenization varies between models, on average, 1 token β‰ˆ 3.5 characters in English.Note: Each model uses its own tokenizer, so actual token counts may vary significantly.

As a rough guide, 1 million tokens is approximately equivalent to:

30 hours

of a podcast

~150 words per minute

1,000 pages

of a book

~500 words per page

60,000 lines1

of code

~60 characters per line

[1] Based on average characters per line. See Wikipedia.

Comparisons

Capabilities vs price across models (GPQA vs $/1M input tokens); plus other comparisons

API Providers - Open LLM Providers

Price and performance across providers for Llama 4 Maverick

Provider performance varies significantly. Some providers run full-precision models on specialized hardware accelerators (like Groq's LPU or Cerebras' CS-3), while others may use quantization (4-bit, 8-bit) to simulate faster speeds on commodity hardware. Check provider documentation for specific hardware and quantization details, as this can impact both speed and model quality.

QualityFP16/BF16
8-bit/4-bitSpeed
Model Quantization Trade-off

Tracking AI progress across nations, model types, and organizations

AI Progress: US vs China

SOTA comparison over time between the United States and China.

Loading country timeline...
πŸ‡ΊπŸ‡Έ United States
πŸ‡¨πŸ‡³ China

Open vs Closed Models

Comparing SOTA progression of open-source and proprietary models over time.

Loading open/closed model timeline...
πŸ”“ Open Models
πŸ”’ Closed Models

Organization Progress

SOTA progression by organization over time.

Loading organization timeline...
Showing top 10 organizations by model count β€’ Each line represents SOTA progression

Observe how different processing speeds affect real-time token generation.Try adjusting the speeds using the number inputs above each panel ↑

t/s
t/s
t/s

Values reset every 5 seconds to demonstrate different speeds