2024-05-15
1 小时 35 分钟Today I have the pleasure to speak with John Shulman,
who is one of the co-founders of OpenAI and leads the post training team here.
And he also led the creation of ChatGBT and is the author of many of the most important and widely cited papers in AI and RL,
including PPO and many others.
So John, really excited to chat with you.
Thanks for coming on the podcast.
Thanks for having me on the podcast.
I'm a big fan.
Oh, thank you.
Thank you for saying that.
So the first question I had is,
we have these distinctions between pre-training and post-training beyond what is actually happening in terms of loss function and training regimes.
I'm just curious, taking a step back conceptually, like what kind of thing is pre-training creating?
What does post-training do on top of that?
In pre-training,
you're basically training to imitate all of the content on the internet or on the web,
including websites and code and so forth.
So you get a model that can basically generate content that looks like random web pages from the internet.
And the model is also trained to maximize likelihood where it has to put a probability on everything.
So the objective is basically predicting the next token given the previous tokens.