QMSum

Paper

Progress Over Time

Interactive timeline showing model performance evolution on QMSum

State-of-the-art frontier
Open
Proprietary

QMSum Leaderboard

2 models
ContextCostLicense
14B
260B
Notice missing or incorrect data?
About this benchmark

What is QMSum?

QMSum is a benchmark for query-based multi-domain meeting summarization consisting of 1,808 query-summary pairs over 232 meetings across academic, product, and committee domains. The dataset enables models to select and summarize relevant spans of meetings in response to specific queries. Published at NAACL 2021, QMSum presents significant challenges in long meeting summarization where models must identify and summarize relevant content based on user queries.

QMSum is a text benchmark evaluating models on long context and summarization tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.2, with the leader at 0.2.

Compare leaders on the best AI for long context and best AI for summarization leaderboards.

Current leaders

Phi-3.5-mini-instruct from Microsoft currently leads the QMSum leaderboard with a score of 0.213 across 2 evaluated AI models.

1Phi-3.5-mini-instructMicrosoft21.3%
2Phi-3.5-MoE-instructMicrosoft19.9%

Source paper

Title
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
Authors
Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, and 7 others
Published
Abstract

Meetings are a key component of human collaboration. As increasing numbers of meetings are recorded and transcribed, meeting summaries have become essential to remind those who may or may not have attended the meetings about the key decisions made and the tasks to be completed. However, it is hard to create a single short summary that covers all the content of a long meeting involving multiple people and topics. In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and we introduce QMSum, a new benchmark for this task. QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains. Besides, we investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task. Experimental results and manual analysis reveal that QMSum presents significant challenges in long meeting summarization for future research. Dataset is available at \url{https://github.com/Yale-LILY/QMSum}.

FAQ

Common questions about the QMSum benchmark and leaderboard.

What is the QMSum benchmark?

QMSum is a benchmark for query-based multi-domain meeting summarization consisting of 1,808 query-summary pairs over 232 meetings across academic, product, and committee domains. The dataset enables models to select and summarize relevant spans of meetings in response to specific queries. Published at NAACL 2021, QMSum presents significant challenges in long meeting summarization where models must identify and summarize relevant content based on user queries.

What is the QMSum leaderboard?

The QMSum leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Phi-3.5-mini-instruct by Microsoft leads with a score of 0.213. The average score across all models is 0.206.

What is the highest QMSum score?

The highest QMSum score is 0.213, achieved by Phi-3.5-mini-instruct from Microsoft.

How many models are evaluated on QMSum?

2 models have been evaluated on the QMSum benchmark, with 0 verified results and 2 self-reported results.

Where can I find the QMSum paper?

The QMSum paper is available at https://arxiv.org/abs/2104.05938. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does QMSum cover?

QMSum is categorized under long context and summarization. The benchmark evaluates text models.

What is the best open-source model on QMSum?

Phi-3.5-mini-instruct by Microsoft is the top-ranked open-source model on QMSum, with a score of 0.213 (rank #1).

How recent are the QMSum leaderboard results?

The QMSum leaderboard was last updated in July 2026 and currently includes 2 evaluated models.