VocalSound Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on VocalSound

State-of-the-art frontier

Open

Proprietary

VocalSound Leaderboard

1 models

				Context	Cost	License
1	Qwen2.5-Omni-7B Alibaba Cloud / Qwen Team		7B	—	—

FAQ

Common questions about VocalSound

A dataset for improving human vocal sounds recognition, containing over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Used for audio event classification and recognition of human non-speech vocalizations.

The VocalSound paper is available at https://arxiv.org/abs/2205.03433. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The VocalSound leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.939. The average score across all models is 0.939.

The highest VocalSound score is 0.939, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.

1 models have been evaluated on the VocalSound benchmark, with 0 verified results and 1 self-reported results.

VocalSound is categorized under audio. The benchmark evaluates audio models.

VocalSound

Progress Over Time

VocalSound Leaderboard

FAQ

What is the VocalSound benchmark?

Where can I find the VocalSound paper?

What is the VocalSound leaderboard?

What is the highest VocalSound score?

How many models are evaluated on VocalSound?

What categories does VocalSound cover?