GiantSteps Tempo
A dataset for tempo estimation in electronic dance music containing 664 2-minute audio previews from Beatport, annotated from user corrections for evaluating automatic tempo estimation algorithms.
Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team currently leads the GiantSteps Tempo leaderboard with a score of 0.880 across 1 evaluated AI models.
What GiantSteps Tempo measures
GiantSteps Tempo is a audio benchmark that evaluates large language models on audio tasks. LLM Stats tracks 1 model on this benchmark, with a maximum possible score of 1. Current average across reported models is 0.9, with the leader reaching 0.9.
Compare leaders on the best AI for audio leaderboards.
Qwen2.5-Omni-7B leads with 88.0%.
Progress Over Time
Interactive timeline showing model performance evolution on GiantSteps Tempo
GiantSteps Tempo Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 7B | — | — |
FAQ
Common questions about GiantSteps Tempo.
More evaluations to explore
Related benchmarks in the same category
CoVoST 2 is a large-scale multilingual speech translation corpus derived from Common Voice, covering translations from 21 languages into English and from English into 15 languages. The dataset contains 2,880 hours of speech with 78K speakers for speech translation research.
A massive multi-task audio understanding and reasoning benchmark comprising 10,000 carefully curated audio clips paired with human-annotated natural language questions spanning speech, environmental sounds, and music. Requires expert-level knowledge and complex reasoning across 27 distinct skills.
Big Bench Audio is an audio reasoning benchmark adapted from a subset of Big Bench Hard, with text questions converted to spoken audio. It evaluates the reasoning ability of speech-to-speech and audio language models on tasks delivered as audio input, with accuracy scored by an independent evaluation (Artificial Analysis).
Common Voice is a massively-multilingual collection of transcribed speech intended for speech technology research and development. Version 15.0 contains 28,750 recorded hours across 114 languages, consisting of crowdsourced voice recordings with corresponding transcriptions.
CoVoST 2 English-to-Chinese subset is part of the large-scale multilingual speech translation corpus derived from Common Voice. This subset focuses specifically on English to Chinese speech translation tasks within the broader CoVoST 2 dataset.
MAVERIX (Multimodal Audio-Visual Evaluation Reasoning Index) evaluates multimodal models on tasks that demand tight integration of video and audio information. It features challenges like situational awareness and social sentiment analysis where the answer cannot be reliably determined from a single modality, rigorously testing joint audio-visual understanding.