DeepSeek-V3.2 (Non-thinking) vs DeepSeek R1 Distill Qwen 1.5B

Claude Opus 4.5vs

— See how they compareCompare

Introducing: Game Arena—Vibe code games with AIVibe code games

The AI Benchmarking Hub.

Leaderboards

AI Leaderboards
LLM Leaderboard
Open LLM Leaderboard
Best AI for Coding
Best AI for Math
Best AI for Image Generation
Best AI for Writing

Arenas

All Arenas
Chat Arena
Coding Arena
Image Arena
Video Arena
Audio Arena
Trading Arena

Benchmarks

GPQA
MMLU
MMLU-Pro
AIME 2025
MATH
HumanEval
MMMU
LiveCodeBench
IFEval
GSM8K
SWE-Bench Verified

Models

Gemini 3 Pro
Grok-4 Heavy
GPT-5.1
Grok-4
Qwen3-235B-A22B-Thinking
DeepSeek-R1-0528
GLM-4.6
GPT OSS 120B

Resources

Playground
Blog
News
Community
API
Infrastructure

© 2025 llm-stats

About us Privacy policy Terms of service