Benchmarks/general/LMArena Text Leaderboard

LMArena Text Leaderboard

LMArena Text Leaderboard is a blind human preference evaluation benchmark that ranks models based on pairwise comparisons in real-world conversations. The leaderboard uses Elo ratings computed from user preferences in head-to-head model battles, providing a comprehensive measure of overall model capability and style.

PaperImplementation