Skip to content

Safety Prioritization Led to Anthropic's Formation

by Benjamin Mann on July 20, 2025

In 2020, Benjamin Mann and seven colleagues left OpenAI to found Anthropic with a singular mission: make AI safety the top priority while remaining at the technological frontier. Their journey demonstrates how values-driven leadership can create both technological advancement and responsible innovation.

Situation

  • Organizational tension: At OpenAI, there were "three tribes" (safety, research, and startup) that needed to be "kept in check with each other," creating internal conflict
  • Mission misalignment: Despite OpenAI's stated mission to make AGI "safe and beneficial for humanity," the founding team felt safety wasn't the top priority in practice
  • Industry context: Only about 1,000 people worldwide were working on AI safety despite billions in industry investment ($300B annually in capital expenditure)
  • Timing: This occurred after the GPT-3 project, where Mann was a lead author and helped secure Microsoft's billion-dollar investment

Actions

Creating a Safety-First Organization

  • Mission clarity: Established Anthropic with safety as the explicit #1 priority, not competing with other objectives
  • Constitutional AI: Developed a system where models critique and improve themselves based on explicit principles derived from sources like the UN Declaration of Human Rights
  • Transparency approach: Chose to publicly share model failures and risks (like the blackmail experiment) rather than hiding them
  • Talent strategy: Built a culture where mission alignment trumped financial incentives, helping resist competitor poaching

Balancing Safety and Progress

  • Labs/Frontiers team: Created a dedicated innovation team to bridge research and product, focusing on "skating to where the puck is going"
  • Responsible scaling policy: Developed "AI safety levels" to assess risk at different capability levels
  • Empirical approach: Emphasized testing and evidence over theory, bringing scientific rigor to safety work
  • Congressional testimony: Proactively engaged with policymakers about risks like biological uplift

Results

Short-term Outcomes

  • Product innovation: Developed Claude, a commercially successful AI assistant with distinctive personality traits directly tied to safety values
  • Technical breakthroughs: Created products like Claude Code and Model Context Protocol from the Labs team
  • Talent retention: Successfully retained talent despite competitors offering massive compensation packages ($100M+)
  • Policy influence: Built trust with policymakers through transparency about risks

Long-term Impact

  • Proof of concept: Demonstrated that safety and frontier research can coexist and even reinforce each other
  • Industry shift: Helped move safety from a peripheral concern to a central consideration in AI development
  • Scaling laws validation: Continued to prove that scaling laws hold, while incorporating safety considerations
  • Cultural impact: Created an "egoless" culture where "people just want the right thing to happen"

Key Lessons

  • Values alignment matters: When mission and values are genuinely prioritized, they can attract and retain talent even against massive financial incentives
  • Safety can be a competitive advantage: Rather than slowing progress, safety focus enabled unique products that required user trust
  • Transparency builds trust: Being open about risks and failures with policymakers and the public created more credibility than hiding them
  • Empiricism over ideology: Taking a scientific, evidence-based approach to safety produced better results than theoretical debates
  • Organizational design shapes outcomes: The "three tribes" structure at OpenAI created tension, while Anthropic's unified priority created clarity
  • Exponential thinking required: Success came from understanding the exponential nature of AI progress and "building for 6-12 months from now"