2025-05-23
2 小时 24 分钟Okay, I'm joined again by my friends, Sholto Brickin.
Wait, fine.
You named us differently, but we didn't have Sholto Brickin on Trenton Douglas.
Sholto Douglas and Trenton Brickin, who are now both anenthropic.
Sholto Scaling RL, Trenton still working on Magnistic Intrubility.
Welcome back.
Happy to be here.
Yeah, it's fun.
What's changed since last year?
We talked basically this month in 2024, now in 2025, what's happened?
Okay.
So I think the biggest thing that's changed is RL and language models has finally worked.
And this is manifested in,
we finally have proof of an algorithm that can give us expert human reliability and performance given the right feedback loop.
And so I think this is only really being like conclusively demonstrated in competitive programming and math,
basically.
And so if you think of these two axes,
one is like intellectual complexity of the task and the other is the time horizon of which the task is being completed on.
And I think we have proof that we can reach the peaks of intellectual complexity along many dimensions.
But we haven't yet demonstrated like long running, agentic performance.