Cybersecurity CTFs
Cybersecurity Capture the Flag (CTF) benchmark for evaluating LLMs in offensive security challenges. Contains diverse cybersecurity tasks including cryptography, web exploitation, binary analysis, and forensics to assess AI capabilities in cybersecurity problem-solving.
Progress Over Time
Interactive timeline showing model performance evolution on Cybersecurity CTFs
State-of-the-art frontier
Open
Proprietary
Cybersecurity CTFs Leaderboard
3 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | OpenAI | — | 400K | $1.75 $14.00 | ||
2 | Anthropic | — | 200K | $1.00 $5.00 | ||
3 | OpenAI | — | 128K | $3.00 $12.00 |
Notice missing or incorrect data?
FAQ
Common questions about Cybersecurity CTFs
Cybersecurity Capture the Flag (CTF) benchmark for evaluating LLMs in offensive security challenges. Contains diverse cybersecurity tasks including cryptography, web exploitation, binary analysis, and forensics to assess AI capabilities in cybersecurity problem-solving.
The Cybersecurity CTFs paper is available at https://arxiv.org/abs/2406.05590. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Cybersecurity CTFs leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, GPT-5.3 Codex by OpenAI leads with a score of 0.776. The average score across all models is 0.511.
The highest Cybersecurity CTFs score is 0.776, achieved by GPT-5.3 Codex from OpenAI.
3 models have been evaluated on the Cybersecurity CTFs benchmark, with 0 verified results and 3 self-reported results.
Cybersecurity CTFs is categorized under safety. The benchmark evaluates text models.