First AI Red-Teaming Competition Created Benchmark Dataset
by Sander Schulhof on December 21, 2025
Situation
In the early days of generative AI security research, there was a significant gap in understanding how vulnerable these systems were to attacks like prompt injection and jailbreaking. While individual researchers were exploring these vulnerabilities, there was no coordinated effort to systematically document, categorize, and share these attack vectors. This lack of structured knowledge made it difficult for AI labs to benchmark their security measures and improve defenses.
Actions
- Pioneered competitive red teaming: Sander Schulhof organized the first-ever generative AI red teaming competition, bringing together security researchers to find vulnerabilities
- Secured industry participation: Convinced major AI companies including OpenAI, Scale, Hugging Face, and approximately 10 other AI companies to sponsor the competition
- Created a collaborative framework: Established a structure for ethical hackers to document their attack methods and successes
- Open-sourced the findings: Rather than keeping the vulnerabilities private, made the decision to publish the dataset for broader industry use
- Formalized the research: Turned the competition results into an academic paper that was submitted to a major natural language processing conference
Results
- Academic recognition: The paper documenting the dataset won "best theme paper" at EMNLP 2023, one of the top natural language processing conferences, out of approximately 20,000 submissions
- Industry adoption: The dataset became the first and largest collection of prompt injections, now used by "every single frontier lab and most Fortune 500 companies"
- Benchmarking standard: The dataset established a common reference point for measuring model security and vulnerability
- Security transparency: Made previously obscure attack vectors visible and measurable across the industry
- Ongoing impact: The competition format proved valuable enough to continue running, evolving into a regular security assessment event
Key Lessons
- Collaborative security works better than secrecy: By open-sourcing attack vectors rather than keeping them private, the entire field could advance its defenses more rapidly
- Competition accelerates discovery: The competitive format uncovered more vulnerabilities more quickly than individual research efforts
- Standardized benchmarks drive improvement: Creating a common dataset allowed companies to measure their progress against a consistent standard
- Academic-industry partnerships are powerful: Bridging the gap between academic research and industry practice created more impact than either could achieve alone
- Security research requires structure: Converting ad-hoc security testing into a formal, documented process made the findings more actionable
- Transparency builds trust: Rather than hiding vulnerabilities, documenting them openly created more trust in the field's ability to address them