Tau3 Airline
Progress Over Time
Interactive timeline showing model performance evolution on Tau3 Airline
Tau3 Airline Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Mistral AI | 128B | 256K | $1.50 / $7.50 |
What is Tau3 Airline?
τ³-Bench airline domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated airline booking and reservations environment.
Tau3 Airline is a text benchmark evaluating models on reasoning, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.7.
Compare leaders on the best AI for reasoning, best AI for agents and best AI for tool calling leaderboards.
Current leaders
Mistral Medium 3.5 from Mistral AI currently leads the Tau3 Airline leaderboard with a score of 0.720 across 1 evaluated AI models.
FAQ
Common questions about the Tau3 Airline benchmark and leaderboard.