Grok-4.1 Thinking vs Kimi-k1.5

Gemini 3.0 exclusive benchmarks are here! Explore the latest performance metrics

The leading AI leaderboard featuring LLM benchmarks and LLM arenas for objective model evaluation.

Leaderboards

Models
Benchmarks
Arenas

Tools

Compare
Playground
Search

Benchmarks

MMLU
HellaSwag
GSM8K
HumanEval
TruthfulQA
ARC

Models

GPT-4o
Claude 3.5 Sonnet
Gemini 2.0
Llama 3.3 70B
DeepSeek V3
Qwen2.5 72B

Resources

Blog
News
Community
API
Docs

© 2025 llm-stats

About us Privacy policy Terms of service