Some thoughts on the Sutton interview

关于萨顿访谈的一些思考

Dwarkesh Podcast

2025-10-05

11 分钟
PDF

单集简介 ...

I have a much better understanding of Sutton’s perspective now. I wanted to reflect on it a bit. (00:00:00) - The steelman (00:02:42) - TLDR of my current thoughts (00:03:22) - Imitation learning is continuous with and complementary to RL (00:08:26) - Continual learning (00:10:31) - Concluding thoughts Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
更多

单集文稿 ...

  • Boy, do you guys have a lot of thoughts about this interview.

  • I've been thinking about it myself,

  • and I think I have a much better understanding now of Sun's perspective than I did during the interview itself.

  • So I want to reflect on how I understand his worldview now.

  • And Richard apologies if there's still any errors or misunderstandings.

  • It's been very productive to learn from your thoughts.

  • Okay, so here's my understanding of the steel man of Richard's position.

  • Obviously, he wrote the same essay the other lesson.

  • And what is this essay about?

  • Well, it's not saying that you just want to throw away as much compute as you possibly can.

  • The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute.

  • Most of the compute that's spent on an LLM is used in running it during deployment.

  • And yet it's not learning anything during this entire period.

  • It's only learning during this special phase that we call training.

  • And so this is obviously not an effective use of compute.

  • And what's even worse is that this training period by itself is highly inefficient

  • because these models are usually trained on the equivalent of tens of thousands of years of human experience.

  • And what's more, during this training phase,

  • all of their learning is coming straight from human data.

  • Now, this is an obvious point in the case of pre-training data,