VocalSound
A dataset for improving human vocal sounds recognition, containing over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Used for audio event classification and recognition of human non-speech vocalizations.
Progress Over Time
Interactive timeline showing model performance evolution on VocalSound
State-of-the-art frontier
Open
Proprietary
VocalSound Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 7B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about VocalSound
A dataset for improving human vocal sounds recognition, containing over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. Used for audio event classification and recognition of human non-speech vocalizations.
The VocalSound paper is available at https://arxiv.org/abs/2205.03433. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The VocalSound leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.939. The average score across all models is 0.939.
The highest VocalSound score is 0.939, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the VocalSound benchmark, with 0 verified results and 1 self-reported results.
VocalSound is categorized under audio. The benchmark evaluates audio models.