Safety Prioritization Led to Anthropic's Formation
by Benjamin Mann on July 20, 2025
In 2020, Benjamin Mann and seven colleagues left OpenAI to found Anthropic with a singular mission: make AI safety the top priority while remaining at the technological frontier. Their journey demonstrates how values-driven leadership can create both technological advancement and responsible innovation.
Situation
- Organizational tension: At OpenAI, there were "three tribes" (safety, research, and startup) that needed to be "kept in check with each other," creating internal conflict
- Mission misalignment: Despite OpenAI's stated mission to make AGI "safe and beneficial for humanity," the founding team felt safety wasn't the top priority in practice
- Industry context: Only about 1,000 people worldwide were working on AI safety despite billions in industry investment ($300B annually in capital expenditure)
- Timing: This occurred after the GPT-3 project, where Mann was a lead author and helped secure Microsoft's billion-dollar investment
Actions
Creating a Safety-First Organization
- Mission clarity: Established Anthropic with safety as the explicit #1 priority, not competing with other objectives
- Constitutional AI: Developed a system where models critique and improve themselves based on explicit principles derived from sources like the UN Declaration of Human Rights
- Transparency approach: Chose to publicly share model failures and risks (like the blackmail experiment) rather than hiding them
- Talent strategy: Built a culture where mission alignment trumped financial incentives, helping resist competitor poaching
Balancing Safety and Progress
- Labs/Frontiers team: Created a dedicated innovation team to bridge research and product, focusing on "skating to where the puck is going"
- Responsible scaling policy: Developed "AI safety levels" to assess risk at different capability levels
- Empirical approach: Emphasized testing and evidence over theory, bringing scientific rigor to safety work
- Congressional testimony: Proactively engaged with policymakers about risks like biological uplift
Results
Short-term Outcomes
- Product innovation: Developed Claude, a commercially successful AI assistant with distinctive personality traits directly tied to safety values
- Technical breakthroughs: Created products like Claude Code and Model Context Protocol from the Labs team
- Talent retention: Successfully retained talent despite competitors offering massive compensation packages ($100M+)
- Policy influence: Built trust with policymakers through transparency about risks
Long-term Impact
- Proof of concept: Demonstrated that safety and frontier research can coexist and even reinforce each other
- Industry shift: Helped move safety from a peripheral concern to a central consideration in AI development
- Scaling laws validation: Continued to prove that scaling laws hold, while incorporating safety considerations
- Cultural impact: Created an "egoless" culture where "people just want the right thing to happen"
Key Lessons
- Values alignment matters: When mission and values are genuinely prioritized, they can attract and retain talent even against massive financial incentives
- Safety can be a competitive advantage: Rather than slowing progress, safety focus enabled unique products that required user trust
- Transparency builds trust: Being open about risks and failures with policymakers and the public created more credibility than hiding them
- Empiricism over ideology: Taking a scientific, evidence-based approach to safety produced better results than theoretical debates
- Organizational design shapes outcomes: The "three tribes" structure at OpenAI created tension, while Anthropic's unified priority created clarity
- Exponential thinking required: Success came from understanding the exponential nature of AI progress and "building for 6-12 months from now"