← All Guests
BM

Benjamin Mann

Ultimately, people think about this as probably going to hit a wall because if the model isn't good enough to see its own mistakes, then how could it improve, covering AI product work, product design, and engineering tradeoffs.

July 20, 2025·13,064 words
AI & Machine LearningGrowth & MetricsLeadership & ManagementProduct StrategyStartup BuildingDesign & UXEngineeringPricing & MonetizationCareer & Personal GrowthUser PsychologyData & Analytics

Episode

Benjamin Mann

Summary

Anthropic co-founder and former OpenAI researcher discusses why he left OpenAI over safety concerns, estimating a 50th percentile chance of superintelligence by 2028. He explains Anthropic's Constitutional AI approach, the difference between alignment today versus post-superintelligence, and why spreading safety awareness through networks is the most important thing anyone can do right now.

Key Takeaways

1

The window to get AI alignment right is now, before superintelligence. Once we reach superintelligence, the models will be too capable to align after the fact.

2

Safety-oriented labs retain top researchers against $100M+ offers because mission clarity is the ultimate recruiting moat — affecting the future of humanity, not just making money.

3

The most dangerous near-term AI risks aren't robots — they're software attacks on critical infrastructure (power grids, financial systems) that could cause massive physical damage.

4

Constitutional AI works by giving the model principles and having it critique and revise its own outputs — baking values into training rather than patching them post-hoc.

5

Prepare mentally for the world to get much weirder, much faster. If things feel wild now, this is as normal as it will ever be.

Notable Quotes

We've been much less affected because people here, they get these offers and then they say, well, of course I'm not going to leave because my best case scenario at Meta is that we make money and my best case scenario at Anthropic is we affect the future of humanity.

AI & Machine Learning
00:00:45

And if you look at something like sycophancy, I think Claude is one of the least sycophantic models because we've put so much effort into actual alignment and not just trying to good heart our metrics of saying user engagement is number one, and if people say yes, then it's good for them.

AI & Machine LearningGrowth & MetricsData & Analytics
00:27:21

How do you do recursive self-improvement and make sure it's aligned at the same time? I think that's the name of the game. To me, it just nets out to how do humans do that and how do human organizations do that? Corporations are probably the most scaled human agents today. They have certain goals that they're trying to reach, and they have certain guiding principles, they have some oversight in terms of shareholders and stakeholders and board members. How do you make corporations aligned and able to sort of recursively self-improve?

AI & Machine LearningStartup Building
00:55:12