Tau3 Retail

Progress Over Time

Interactive timeline showing model performance evolution on Tau3 Retail

State-of-the-art frontier
Open
Proprietary

Tau3 Retail Leaderboard

1 models
ContextCostLicense
1128B256K$1.50 / $7.50
Notice missing or incorrect data?
About this benchmark

What is Tau3 Retail?

τ³-Bench retail domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated online retail environment.

Tau3 Retail is a text benchmark evaluating models on reasoning, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.

Compare leaders on the best AI for reasoning, best AI for agents and best AI for tool calling leaderboards.

Current leaders

Mistral Medium 3.5 from Mistral AI currently leads the Tau3 Retail leaderboard with a score of 0.761 across 1 evaluated AI models.

1Mistral Medium 3.5Mistral AI76.1%

FAQ

Common questions about the Tau3 Retail benchmark and leaderboard.

What is the Tau3 Retail benchmark?

τ³-Bench retail domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated online retail environment.

What is the Tau3 Retail leaderboard?

The Tau3 Retail leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Mistral Medium 3.5 by Mistral AI leads with a score of 0.761. The average score across all models is 0.761.

What is the highest Tau3 Retail score?

The highest Tau3 Retail score is 0.761, achieved by Mistral Medium 3.5 from Mistral AI.

How many models are evaluated on Tau3 Retail?

1 models have been evaluated on the Tau3 Retail benchmark, with 0 verified results and 1 self-reported results.

What categories does Tau3 Retail cover?

Tau3 Retail is categorized under reasoning, agents, and tool calling. The benchmark evaluates text models.

What is the best open-source model on Tau3 Retail?

Mistral Medium 3.5 by Mistral AI is the top-ranked open-source model on Tau3 Retail, with a score of 0.761 (rank #1).

Which model offers the best value on Tau3 Retail?

Among models scoring within 10% of the leader, Mistral Medium 3.5 from Mistral AI is the cheapest, at $1.50 per million input tokens with a score of 0.761.

How recent are the Tau3 Retail leaderboard results?

The Tau3 Retail leaderboard was last updated in July 2026 and currently includes 1 evaluated models.
Tau3 Retail Leaderboard