AI2D
AI2D is a dataset of 4,903 illustrative diagrams from grade school natural sciences (such as food webs, human physiology, and life cycles) with over 15,000 multiple choice questions and answers. The benchmark evaluates diagram understanding and visual reasoning capabilities, requiring models to interpret diagrammatic elements, relationships, and structure to answer questions about scientific concepts represented in visual form.
Claude 3.5 Sonnet from Anthropic currently leads the AI2D leaderboard with a score of 0.947 across 32 evaluated AI models.