FLEURS

Paper

Progress Over Time

Interactive timeline showing model performance evolution on FLEURS

State-of-the-art frontier
Open
Proprietary

FLEURS Leaderboard

6 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
2
3
412B
5
68B
Notice missing or incorrect data?
About this benchmark

What is FLEURS?

Few-shot Learning Evaluation of Universal Representations of Speech - a parallel speech dataset in 102 languages built on FLoRes-101 with approximately 12 hours of speech supervision per language for tasks including ASR, speech language identification, translation and retrieval. Scores are shown as speech recognition accuracy (1 - word error rate), so higher is better.

FLEURS is a audio benchmark evaluating models on language and speech to text tasks. LLM Stats tracks 6 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 1.0.

Compare leaders on the best AI for language and best AI for speech to text leaderboards.

Current leaders

Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team currently leads the FLEURS leaderboard with a score of 0.959 across 6 evaluated AI models.

1Qwen2.5-Omni-7BAlibaba Cloud / Qwen Team95.9%
2Gemini 1.0 ProGoogle93.6%
3Gemini 1.5 ProGoogle93.3%

Source paper

Title
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Authors
Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, and 5 others
Published
Abstract

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

FAQ

Common questions about the FLEURS benchmark and leaderboard.

What is the FLEURS benchmark?

Few-shot Learning Evaluation of Universal Representations of Speech - a parallel speech dataset in 102 languages built on FLoRes-101 with approximately 12 hours of speech supervision per language for tasks including ASR, speech language identification, translation and retrieval. Scores are shown as speech recognition accuracy (1 - word error rate), so higher is better.

What is the FLEURS leaderboard?

The FLEURS leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.959. The average score across all models is 0.921.

What is the highest FLEURS score?

The highest FLEURS score is 0.959, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.

How many models are evaluated on FLEURS?

6 models have been evaluated on the FLEURS benchmark, with 0 verified results and 5 self-reported results.

Where can I find the FLEURS paper?

The FLEURS paper is available at https://arxiv.org/abs/2205.12446. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does FLEURS cover?

FLEURS is categorized under language and speech to text. The benchmark evaluates audio models with multilingual support.

What is the best open-source model on FLEURS?

Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on FLEURS, with a score of 0.959 (rank #1).

How recent are the FLEURS leaderboard results?

The FLEURS leaderboard was last updated in July 2026 and currently includes 6 evaluated models.