Tau3 Airline

Progress Over Time

Interactive timeline showing model performance evolution on Tau3 Airline

State-of-the-art frontier
Open
Proprietary

Tau3 Airline Leaderboard

1 models
ContextCostLicense
1128B256K$1.50 / $7.50
Notice missing or incorrect data?
About this benchmark

What is Tau3 Airline?

τ³-Bench airline domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated airline booking and reservations environment.

Tau3 Airline is a text benchmark evaluating models on reasoning, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.7.

Compare leaders on the best AI for reasoning, best AI for agents and best AI for tool calling leaderboards.

Current leaders

Mistral Medium 3.5 from Mistral AI currently leads the Tau3 Airline leaderboard with a score of 0.720 across 1 evaluated AI models.

1Mistral Medium 3.5Mistral AI72.0%

FAQ

Common questions about the Tau3 Airline benchmark and leaderboard.

What is the Tau3 Airline benchmark?

τ³-Bench airline domain evaluates agentic models on multi-turn, tool-using customer-support scenarios in a simulated airline booking and reservations environment.

What is the Tau3 Airline leaderboard?

The Tau3 Airline leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Mistral Medium 3.5 by Mistral AI leads with a score of 0.720. The average score across all models is 0.720.

What is the highest Tau3 Airline score?

The highest Tau3 Airline score is 0.720, achieved by Mistral Medium 3.5 from Mistral AI.

How many models are evaluated on Tau3 Airline?

1 models have been evaluated on the Tau3 Airline benchmark, with 0 verified results and 1 self-reported results.

What categories does Tau3 Airline cover?

Tau3 Airline is categorized under reasoning, agents, and tool calling. The benchmark evaluates text models.

What is the best open-source model on Tau3 Airline?

Mistral Medium 3.5 by Mistral AI is the top-ranked open-source model on Tau3 Airline, with a score of 0.720 (rank #1).

Which model offers the best value on Tau3 Airline?

Among models scoring within 10% of the leader, Mistral Medium 3.5 from Mistral AI is the cheapest, at $1.50 per million input tokens with a score of 0.720.

How recent are the Tau3 Airline leaderboard results?

The Tau3 Airline leaderboard was last updated in July 2026 and currently includes 1 evaluated models.