SciCode

SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on SciCode

State-of-the-art frontier
Open
Proprietary

SciCode Leaderboard

10 models • 0 verified
ContextCostLicense
1
1.0M
$2.50
$15.00
2
Moonshot AI
Moonshot AI
1.0T262K
$0.60
$2.50
3
1.0T
4
120B262K
$0.10
$0.50
5
Zhipu AI
Zhipu AI
355B131K
$0.40
$1.60
6
230B1.0M
$0.30
$1.20
7
Inception
Inception
128K
$0.25
$0.75
8
Zhipu AI
Zhipu AI
106B
9
MiniMax
MiniMax
230B1.0M
$0.30
$1.20
10
32B262K
$0.06
$0.24
Notice missing or incorrect data?

FAQ

Common questions about SciCode

SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.
The SciCode paper is available at https://arxiv.org/abs/2407.13168. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The SciCode leaderboard ranks 10 AI models based on their performance on this benchmark. Currently, Gemini 3.1 Pro by Google leads with a score of 0.590. The average score across all models is 0.420.
The highest SciCode score is 0.590, achieved by Gemini 3.1 Pro from Google.
10 models have been evaluated on the SciCode benchmark, with 0 verified results and 10 self-reported results.
SciCode is categorized under biology, chemistry, code, math, physics, and reasoning. The benchmark evaluates text models.