MedXpertQA
A comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning, featuring 4,460 questions spanning 17 specialties and 11 body systems. Includes both text-only and multimodal subsets with expert-level exam questions incorporating diverse medical images and rich clinical information.
Muse Spark from Meta currently leads the MedXpertQA leaderboard with a score of 0.784 across 12 evaluated AI models.