Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

强化学习加大型语言模型是否足以实现通用人工智能?—— 沙尔托·道格拉斯和特伦顿·布里克恩

Dwarkesh Podcast

2025-05-23

2 小时 24 分钟
PDF

单集简介 ...

New episode with my good friends Sholto Douglas & Trenton Bricken. Sholto focuses on scaling RL and Trenton researches mechanistic interpretability, both at Anthropic. We talk through what’s changed in the last year of AI research; the new RL regime and how far it can scale; how to trace a model’s thoughts; and how countries, workers, and students should prepare for AGI. See you next year for v3. Here’s last year’s episode, btw. Enjoy! Watch on YouTube; listen on Apple Podcasts or Spotify. ---------- SPONSORS * WorkOS ensures that AI companies like OpenAI and Anthropic don't have to spend engineering time building enterprise features like access controls or SSO. It’s not that they don't need these features; it's just that WorkOS gives them battle-tested APIs that they can use for auth, provisioning, and more. Start building today at workos.com. * Scale is building the infrastructure for safer, smarter AI. Scale’s Data Foundry gives major AI labs access to high-quality data to fuel post-training, while their public leaderboards help assess model capabilities. They also just released Scale Evaluation, a new tool that diagnoses model limitations. If you’re an AI researcher or engineer, learn how Scale can help you push the frontier at scale.com/dwarkesh. * Lighthouse is THE fastest immigration solution for the technology industry. They specialize in expert visas like the O-1A and EB-1A, and they’ve already helped companies like Cursor, Notion, and Replit navigate U.S. immigration. Explore which visa is right for you at lighthousehq.com/ref/Dwarkesh. To sponsor a future episode, visit dwarkesh.com/advertise. ---------- TIMESTAMPS (00:00:00) – How far can RL scale? (00:16:27) – Is continual learning a key bottleneck? (00:31:59) – Model self-awareness (00:50:32) – Taste and slop (01:00:51) – How soon to fully autonomous agents? (01:15:17) – Neuralese (01:18:55) – Inference compute will bottleneck AGI (01:23:01) – DeepSeek algorithmic improvements (01:37:42) – Why are LLMs ‘baby AGI’ but not AlphaZero? (01:45:38) – Mech interp (01:56:15) – How countries should prepare for AGI (02:10:26) – Automating white collar work (02:15:35) – Advice for students Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
更多

单集文稿 ...

  • Okay, I'm joined again by my friends, Sholto Brickin.

  • Wait, fine.

  • You named us differently, but we didn't have Sholto Brickin on Trenton Douglas.

  • Sholto Douglas and Trenton Brickin, who are now both anenthropic.

  • Sholto Scaling RL, Trenton still working on Magnistic Intrubility.

  • Welcome back.

  • Happy to be here.

  • Yeah, it's fun.

  • What's changed since last year?

  • We talked basically this month in 2024, now in 2025, what's happened?

  • Okay.

  • So I think the biggest thing that's changed is RL and language models has finally worked.

  • And this is manifested in,

  • we finally have proof of an algorithm that can give us expert human reliability and performance given the right feedback loop.

  • And so I think this is only really being like conclusively demonstrated in competitive programming and math,

  • basically.

  • And so if you think of these two axes,

  • one is like intellectual complexity of the task and the other is the time horizon of which the task is being completed on.

  • And I think we have proof that we can reach the peaks of intellectual complexity along many dimensions.

  • But we haven't yet demonstrated like long running, agentic performance.