BFCL_v3_MultiTurn
Progress Over Time
Interactive timeline showing model performance evolution on BFCL_v3_MultiTurn
BFCL_v3_MultiTurn Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | MiniMax | 230B | 1.0M | $0.30 / $1.20 | ||
| 2 | NVIDIA | 9B | — | — |
What is BFCL_v3_MultiTurn?
Berkeley Function Calling Leaderboard (BFCL) V3 MultiTurn benchmark that evaluates large language models' ability to handle multi-turn and multi-step function calling scenarios. The benchmark introduces complex interactions requiring models to manage sequential function calls, handle conversational context across multiple turns, and make dynamic decisions about when and how to use available functions. BFCL V3 uses state-based evaluation by verifying the actual state of API systems after function execution, providing more realistic assessment of function calling capabilities in agentic applications.
BFCL_v3_MultiTurn is a text benchmark evaluating models on reasoning, general, and tool calling tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.
Compare leaders on the best AI for reasoning, best AI for general and best AI for tool calling leaderboards.
Current leaders
MiniMax M2.5 from MiniMax currently leads the BFCL_v3_MultiTurn leaderboard with a score of 0.768 across 2 evaluated AI models.
FAQ
Common questions about the BFCL_v3_MultiTurn benchmark and leaderboard.