2025-09-26
1 小时 6 分钟Today, I'm chatting with Richard Sudden,
who is one of the founding fathers of reinforcement learning and inventor of many of the main techniques used there,
like TD learning and policy gradient methods.
And for that, he received this year's Turing Award,
which if you don't know, is basically the Nobel Prize for Computer Science.
Richard, congratulations.
Thank you, Drakis.
And thanks for coming on the podcast.
It's my pleasure.
Okay, so first question.
My audience and I are familiar with the LLM way of thinking about AI.
Conceptually, What are you missing in terms of thinking about AI from the RL perspective?
Well, yes, I think it's really quite a different point of view.
And it's, it can easily get separated and lose the ability to talk to each other.
And yeah, large lines of miles have become such a big thing, generative AI in general, a big thing.
And our field is subject to bandwagons and fashions.
So we lose,
we lose track of the basic basic things
because I consider reinforcement learning to be basic AI and what is intelligence or the problem is is to understand your world and reinforcement learning is about understanding your world whereas large language models are about mimicking people doing what people say you should do they're not about figuring out what to do.
I guess you would think that to emulate the trillions of tokens in the corpus of internet text,