LLM Leaderboard

Analyze and compare AI models across benchmarks, pricing, and capabilities.

Live Benchmarks

Updated daily

Join our DiscordNEW

Tracking official benchmarks & pricing data from leading organizations

Loading timeline...

LLM Rankings

Best models and API providers in each category

Follow on X

Real-time model updates & benchmark alerts

NEW

Join Discord

Find insights, ask questions, and get help

Benchmarks

Leaderboards about code, reasoning and general knowledge

Context Window

Maximum input context length for each model

While tokenization varies between models, on average, 1 token ≈ 3.5 characters in English.Note: Each model uses its own tokenizer, so actual token counts may vary significantly.

As a rough guide, 1 million tokens is approximately equivalent to:

30 hours

of a podcast

~150 words per minute

1,000 pages

of a book

~500 words per page

60,000 lines¹

of code

~60 characters per line

[1] Based on average characters per line. See Wikipedia.

Comparisons

LLM comparisons across benchmark scores, prices, and model sizes

API Providers - Open LLM Providers

Price and performance across providers for Llama 4 Maverick

Provider performance varies significantly. Some providers run full-precision models on specialized hardware accelerators (like Groq's LPU or Cerebras' CS-3), while others may use quantization (4-bit, 8-bit) to simulate faster speeds on commodity hardware. Check provider documentation for specific hardware and quantization details, as this can impact both speed and model quality.

QualityFP16/BF16

8-bit/4-bitSpeed

Model Quantization Trade-off

QualityFP16/BF16

Model Quantization Trade-off

8-bit/4-bitSpeed

Observe how different processing speeds affect real-time token generation.Try adjusting the speeds using the number inputs above each panel ↑

t/s

Values reset every 5 seconds to demonstrate different speeds