LMArena Text Leaderboard
LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.
Progress Over Time
Interactive timeline showing model performance evolution on LMArena Text Leaderboard
State-of-the-art frontier
Open
Proprietary
LMArena Text Leaderboard Leaderboard
2 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | — | 256K | $3.00 $15.00 | |||
2 | xAI | — | 256K | $3.00 $15.00 |
Notice missing or incorrect data?
FAQ
Common questions about LMArena Text Leaderboard
LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.
The LMArena Text Leaderboard paper is available at https://arena.lmsys.org/. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The LMArena Text Leaderboard dataset is available at https://arena.lmsys.org/.
The LMArena Text Leaderboard leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Grok-4.1 Thinking by xAI leads with a score of 1483.000. The average score across all models is 1474.000.
The highest LMArena Text Leaderboard score is 1483.000, achieved by Grok-4.1 Thinking from xAI.
2 models have been evaluated on the LMArena Text Leaderboard benchmark, with 0 verified results and 2 self-reported results.
LMArena Text Leaderboard is categorized under general and reasoning. The benchmark evaluates text models.