Tau3 Retail
Progress Over Time
Interactive timeline showing model performance evolution on Tau3 Retail
Tau3 Retail Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Mistral AI | 128B | 256K | $1.50 / $7.50 |
What is Tau3 Retail?
τ³-Bench retail domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated online retail environment.
Tau3 Retail is a text benchmark evaluating models on reasoning, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.
Compare leaders on the best AI for reasoning, best AI for agents and best AI for tool calling leaderboards.
Current leaders
Mistral Medium 3.5 from Mistral AI currently leads the Tau3 Retail leaderboard with a score of 0.761 across 1 evaluated AI models.
FAQ
Common questions about the Tau3 Retail benchmark and leaderboard.