QMSum
QMSum is a benchmark for query-based multi-domain meeting summarization consisting of 1,808 query-summary pairs over 232 meetings across academic, product, and committee domains. The dataset enables models to select and summarize relevant spans of meetings in response to specific queries. Published at NAACL 2021, QMSum presents significant challenges in long meeting summarization where models must identify and summarize relevant content based on user queries.
Progress Over Time
Interactive timeline showing model performance evolution on QMSum
State-of-the-art frontier
Open
Proprietary
QMSum Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Microsoft | 4B | 128K | $0.10 / $0.10 | ||
| 2 | Microsoft | 60B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about QMSum
QMSum is a benchmark for query-based multi-domain meeting summarization consisting of 1,808 query-summary pairs over 232 meetings across academic, product, and committee domains. The dataset enables models to select and summarize relevant spans of meetings in response to specific queries. Published at NAACL 2021, QMSum presents significant challenges in long meeting summarization where models must identify and summarize relevant content based on user queries.
The QMSum paper is available at https://arxiv.org/abs/2104.05938. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The QMSum leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Phi-3.5-mini-instruct by Microsoft leads with a score of 0.213. The average score across all models is 0.206.
The highest QMSum score is 0.213, achieved by Phi-3.5-mini-instruct from Microsoft.
2 models have been evaluated on the QMSum benchmark, with 0 verified results and 2 self-reported results.
QMSum is categorized under long context and summarization. The benchmark evaluates text models.