PathMCQA

PathMMU is a massive multimodal expert-level benchmark for understanding and reasoning in pathology, containing 33,428 multimodal multi-choice questions and 24,067 images validated by seven pathologists. It evaluates Large Multimodal Models (LMMs) performance on pathology tasks, with the top-performing model GPT-4V achieving only 49.8% zero-shot performance compared to 71.8% for human pathologists.

MedGemma 4B IT from Google currently leads the PathMCQA leaderboard with a score of 0.698 across 1 evaluated AI models.

Paper