CommonSenseQA
CommonSenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict correct answers. It contains 12,102 questions with one correct answer and four distractors, designed to test semantic reasoning and conceptual relationships. Questions are created based on ConceptNet concepts and require prior world knowledge for accurate reasoning.
Progress Over Time
Interactive timeline showing model performance evolution on CommonSenseQA
State-of-the-art frontier
Open
Proprietary
CommonSenseQA Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Mistral AI | 12B | 128K | $0.15 / $0.15 |
Notice missing or incorrect data?
FAQ
Common questions about CommonSenseQA
CommonSenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict correct answers. It contains 12,102 questions with one correct answer and four distractors, designed to test semantic reasoning and conceptual relationships. Questions are created based on ConceptNet concepts and require prior world knowledge for accurate reasoning.
The CommonSenseQA paper is available at https://arxiv.org/abs/1811.00937. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The CommonSenseQA leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Mistral NeMo Instruct by Mistral AI leads with a score of 0.704. The average score across all models is 0.704.
The highest CommonSenseQA score is 0.704, achieved by Mistral NeMo Instruct from Mistral AI.
1 models have been evaluated on the CommonSenseQA benchmark, with 0 verified results and 1 self-reported results.
CommonSenseQA is categorized under language and reasoning. The benchmark evaluates text models.