Paul Christiano — Preventing an AI takeover

保罗·克里斯蒂亚诺——防止人工智能接管

Dwarkesh Podcast

2023-10-31

3 小时 7 分钟

PDF

单集简介 ...

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out! We discuss: - Does he regret inventing RLHF, and is alignment necessarily dual-use? - Why he has relatively modest timelines (40% by 2040, 15% by 2030), - What do we want post-AGI world to look like (do we want to keep gods enslaved forever)? - Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon, - His current research into a new proof system, and how this could solve alignment by explaining model's behavior - and much more. Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes. Open Philanthropy Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations. For more information and to apply, please see the application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/ The deadline to apply is November 9th; make sure to check out those roles before they close. Timestamps (00:00:00) - What do we want post-AGI world to look like? (00:24:25) - Timelines (00:45:28) - Evolution vs gradient descent (00:54:53) - Misalignment and takeover (01:17:23) - Is alignment dual-use? (01:31:38) - Responsible scaling policies (01:58:25) - Paul’s alignment research (02:35:01) - Will this revolutionize theoretical CS and math? (02:46:11) - How Paul invented RLHF (02:55:10) - Disagreements with Carl Shulman (03:01:53) - Long TSMC but not NVIDIA Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Okay, today I have the pleasure of interviewing Paul Cristiano,
who is the leading AI safety researcher.
He's the person that labs and governments turn to when they want feedback and advice on their safety plans.
He previously led the language model alignment team at OpenAI, where he led the invention of RLHF.
And now he is the head of the Alignment Research Center.
And they've been working with the big labs to identify when these models will be too unsafe to keep scaling.
Paul, welcome to the podcast.
Thanks for having me.
Looking forward to talking.
Okay, so first question, and this is a question I've asked,
Holden, Ilya, Dario, and none of that had given me a satisfying answer.
Give me a concrete sense of what a post-AGI world that would be good would look like.
Like, how are humans interfacing with the AI?
What is the economic and political structure?
Yeah, I guess this is a tough question for a bunch of reasons.
Maybe the biggest one is concrete, and I think it's just,
if we're talking about really long spans of time.
then a lot will change and it's really hard for someone to talk completely about what that will look like without saying really silly things.
But I can venture some guesses or fill in some parts.
I think this is also a question of how good is good.

> Dwarkesh Podcast 的更多单集

Paul Christiano — Preventing an AI takeover

保罗·克里斯蒂亚诺——防止人工智能接管

Dwarkesh Podcast

单集简介 ...

单集文稿 ...