Question 1

What is benchmark contamination in AI models?

Accepted Answer

Benchmark contamination occurs when test data from evaluation benchmarks leaks into a model's training data. This inflates benchmark scores by 5-15 percentage points because the model has memorized answers rather than demonstrating genuine capability.

Question 2

How does contamination affect model comparisons?

Accepted Answer

Contamination makes fair model comparisons difficult. A model with contaminated scores may appear more capable than it actually is. This is why independent evaluation and arena-based testing are increasingly important.

Question 3

How can contamination be detected?

Accepted Answer

Common detection methods include testing for exact memorization, comparing performance on original vs. rephrased questions, and analyzing performance patterns across benchmark subsets.

Analyzing LLM Contamination in the Wild

Frequently Asked Questions