SciCode
SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.
Progress Over Time
Interactive timeline showing model performance evolution on SciCode
State-of-the-art frontier
Open
Proprietary
SciCode Leaderboard
10 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Google | — | 1.0M | $2.50 $15.00 | ||
2 | Moonshot AI | 1.0T | 262K | $0.60 $2.50 | ||
3 | Moonshot AI | 1.0T | — | — | ||
4 | 120B | 262K | $0.10 $0.50 | |||
5 | Zhipu AI | 355B | 131K | $0.40 $1.60 | ||
6 | MiniMax | 230B | 1.0M | $0.30 $1.20 | ||
7 | Inception | — | 128K | $0.25 $0.75 | ||
8 | Zhipu AI | 106B | — | — | ||
9 | MiniMax | 230B | 1.0M | $0.30 $1.20 | ||
10 | 32B | 262K | $0.06 $0.24 |
Notice missing or incorrect data?
FAQ
Common questions about SciCode
SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.
The SciCode paper is available at https://arxiv.org/abs/2407.13168. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The SciCode leaderboard ranks 10 AI models based on their performance on this benchmark. Currently, Gemini 3.1 Pro by Google leads with a score of 0.590. The average score across all models is 0.420.
The highest SciCode score is 0.590, achieved by Gemini 3.1 Pro from Google.
10 models have been evaluated on the SciCode benchmark, with 0 verified results and 10 self-reported results.
SciCode is categorized under biology, chemistry, code, math, physics, and reasoning. The benchmark evaluates text models.