SciCode

SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.

Gemini 3.1 Pro from Google currently leads the SciCode leaderboard with a score of 0.590 across 12 evaluated AI models.

Paper