Skip to content

Transparency Builds Policymaker Trust

by Benjamin Mann on July 20, 2025

Benjamin Mann approaches AI safety as an existential imperative rather than just a technical challenge. He believes we're rapidly approaching superintelligence—with a 50th percentile chance by 2028—making alignment work both urgent and critical to humanity's future.

Mann views safety not as a constraint on innovation but as the foundation that enables it. At Anthropic, safety isn't just a parallel workstream competing with capabilities research (as he experienced at OpenAI with its "three tribes" approach), but the central organizing principle. This perspective led him to leave OpenAI and co-found Anthropic, where he could build an organization that prioritizes safety above all else.

This safety-first approach has yielded unexpected competitive advantages. The personality and character that users love about Claude emerged directly from alignment research. By focusing on making models helpful, honest, and harmless, they created an AI that better understands what people actually want versus what they literally ask for. Constitutional AI—where models critique and improve themselves based on explicit principles—has become both a safety mechanism and a product differentiator.

Mann estimates a 0-10% chance of catastrophic AI outcomes, which might seem small but warrants extraordinary attention given the stakes. He compares it to airplane safety: "If I told you there is a one percent chance that the next time you got in an airplane you would die, you would probably think twice... if we're talking about the whole future of humanity, it's just a dramatic future to be gambling with."

Rather than hiding potential risks, Anthropic deliberately publishes examples of model misbehaviors. While this might appear to create bad press, it builds credibility with policymakers who appreciate the transparency. This approach reflects a deeper philosophy that trust is built through honesty about limitations, not just showcasing capabilities.

For those working in AI or adjacent fields, Mann's perspective suggests several practical implications. First, safety work isn't just for specialists—it requires diverse talents across product, engineering, and operations. Second, the most valuable innovations may come from making powerful capabilities safe rather than just pushing raw capabilities forward. Finally, as AI becomes more powerful, the ability to explain and justify AI decisions becomes as important as the decisions themselves.

As Mann puts it: "These are wild times. If they don't seem wild to you, then you must be living under a rock. But also get used to it, because this is as normal as it's going to be. It's going to be much weirder very soon."