LMArena Text Leaderboard

LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.

Grok-4.1 Thinking from xAI currently leads the LMArena Text Leaderboard leaderboard with a score of 1483.000 across 2 evaluated AI models.

PaperImplementation