Some thoughts on the Sutton interview

关于萨顿访谈的一些思考

Dwarkesh Podcast

2025-10-05

11 分钟

PDF

单集简介 ...

I have a much better understanding of Sutton’s perspective now. I wanted to reflect on it a bit. (00:00:00) - The steelman (00:02:42) - TLDR of my current thoughts (00:03:22) - Imitation learning is continuous with and complementary to RL (00:08:26) - Continual learning (00:10:31) - Concluding thoughts Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Boy, do you guys have a lot of thoughts about this interview.
I've been thinking about it myself,
and I think I have a much better understanding now of Sun's perspective than I did during the interview itself.
So I want to reflect on how I understand his worldview now.
And Richard apologies if there's still any errors or misunderstandings.
It's been very productive to learn from your thoughts.
Okay, so here's my understanding of the steel man of Richard's position.
Obviously, he wrote the same essay the other lesson.
And what is this essay about?
Well, it's not saying that you just want to throw away as much compute as you possibly can.
The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute.
Most of the compute that's spent on an LLM is used in running it during deployment.
And yet it's not learning anything during this entire period.
It's only learning during this special phase that we call training.
And so this is obviously not an effective use of compute.
And what's even worse is that this training period by itself is highly inefficient
because these models are usually trained on the equivalent of tens of thousands of years of human experience.
And what's more, during this training phase,
all of their learning is coming straight from human data.
Now, this is an obvious point in the case of pre-training data,

> Dwarkesh Podcast 的更多单集

Some thoughts on the Sutton interview

关于萨顿访谈的一些思考

Dwarkesh Podcast

单集简介 ...

单集文稿 ...