LLM Leaderboard
Analyze and compare AI Models across benchmarks, pricing, and capabilities.




AI Ranking
Best models and API providers in each category
Most Popular Benchmark Categories
Explore AI model performance across specialized domains
Best LLM for Research
Compare LLMs for research, factual accuracy, truthfulness, and reliability in providing accurate information.
Best LLM for Reasoning
Compare reasoning LLMs with benchmarks for logical thinking, problem-solving, and complex analytical tasks.
Best LLM for Coding
Compare coding LLMs with real benchmarks: code generation, debugging, tests, and software engineering tasks.
Best LLM for Math
Compare math-focused LLMs with benchmarks for problem solving, equation solving, and mathematical reasoning.
Best Multimodal LLMs
Compare multimodal AI models that process text, images, audio, and video for comprehensive understanding.
Best LLM for Long Context
Compare LLMs with extended context windows for processing long documents, books, and large-scale text.
LLM Benchmark Leaderboards
Best 15 models across popular benchmarks
One API. Every model.
Test in our playground. Deploy with our unified API. Access 100+ models through a single OpenAI-compatible endpoint.
Open LLM Leaderboard
Best performing open source models ranked by GPQA reasoning benchmark
Context Window
Maximum input context length for each model
While tokenization varies between models, on average, 1 token ≈ 3.5 characters in English.Note: Each model uses its own tokenizer, so actual token counts may vary significantly.
As a rough guide, 1 million tokens is approximately equivalent to:
30 hours
of a podcast
~150 words per minute
1,000 pages
of a book
~500 words per page
60,000 lines1
of code
~60 characters per line
[1] Based on average characters per line. See Wikipedia.
LLM Comparisons
Top models ranked by capabilities with pricing visualization
API Providers - Open LLM Providers
Price and performance across providers for Llama 4 Maverick
Provider performance varies significantly. Some providers run full-precision models on specialized hardware accelerators (like Groq's LPU or Cerebras' CS-3), while others may use quantization (4-bit, 8-bit) to simulate faster speeds on commodity hardware. Check provider documentation for specific hardware and quantization details, as this can impact both speed and model quality.
Trends
Tracking AI progress across nations, model types, and organizations