Safety First Drove Anthropic's OpenAI Exit

Benjamin Mann sees AI safety as the most critical priority in the development of superintelligence, believing it must be addressed proactively rather than reactively.

At Anthropic, safety isn't just a feature but the foundation of their entire approach. Mann left OpenAI because he felt safety wasn't the top priority there, describing how OpenAI had "three tribes" (safety, research, and startup) in tension with each other. In contrast, Anthropic was founded to put safety first while still pushing the frontier of AI capabilities.

This safety-first philosophy manifests in practical ways. Their Constitutional AI approach embeds values and principles directly into the models rather than adding guardrails afterward. They've published examples of model failures that other companies might hide, believing transparency builds trust with policymakers and advances the field. Their product decisions are guided by safety considerations—they've held back consumer applications when they couldn't meet their safety standards.

Mann's perspective on AI risk is nuanced. He estimates a 0-10% chance of catastrophic outcomes, which might seem small but is significant given the stakes: "If I told you that there is a one percent chance that the next time you got in an airplane you would die, you would probably think twice." This drives his belief that alignment research is crucial even if things are "overwhelmingly likely to go well."

For leaders and teams, this philosophy translates to several practical implications. First, values should be embedded in products from the beginning, not added later. Second, transparency about failures builds more trust than projecting perfection. Third, when facing existential challenges, surrounding yourself with mission-aligned people creates resilience against external pressures (like competitors offering massive compensation packages). Finally, when building potentially transformative technology, considering worst-case scenarios isn't "dooming"—it's responsible stewardship.

Mann's approach suggests that when the stakes are highest, leading with your core values isn't just morally right—it becomes your competitive advantage. As he puts it, people stay at Anthropic because "my best case scenario at Meta is that we make money and my best case scenario at Anthropic is we affect the future of humanity."