Francois Chollet — Why the biggest AI models can't solve simple puzzles

弗朗索瓦·科尔莱特 —— 为什么最大的AI模型无法解决简单的谜题

Dwarkesh Podcast

2024-06-12

1 小时 33 分钟

PDF

单集简介 ...

Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they're launching today. I did a bunch of socratic grilling throughout, but Francois’s arguments about why LLMs won’t lead to AGI are very interesting and worth thinking through. It was really fun discussing/debating the cruxes. Enjoy! Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Timestamps (00:00:00) – The ARC benchmark (00:11:10) – Why LLMs struggle with ARC (00:19:00) – Skill vs intelligence (00:27:55) - Do we need “AGI” to automate most jobs? (00:48:28) – Future of AI progress: deep learning + program synthesis (01:00:40) – How Mike Knoop got nerd-sniped by ARC (01:08:37) – Million $ ARC Prize (01:10:33) – Resisting benchmark saturation (01:18:08) – ARC scores on frontier vs open source models (01:26:19) – Possible solutions to ARC Prize Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Okay, today I have the pleasure to speak with Francois Chollet,
who is a AI researcher at Google and creator of Keras,
and he's launching a prize in collaboration with Mike Canouf,
the co-founder of Zapier, who we'll also be talking to in a second,
a million-dollar prize to solve the ARC benchmark that he created.
So, first question, what is the ARC benchmark, and why do we even need this prize?
Why won't the biggest LLM we have in a year be able to just saturate it?
Sure.
So ARC is intended as a kind of IQ test for machine intelligence.
And what makes it different from most LLM benchmarks out there is that it's designed to be resistant to memorization.
So if you look at the way LLMs work, they're basically this big interpolative memory.
And the way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.
And by contrast, arc does not require a lot of knowledge at all.
It's designed to only require what's known as core knowledge,
which is basic knowledge about things like elementary physics,
objectness, counting, that sort of thing.
The sort of knowledge that any four-year-old or five-year-old possesses.
But what's interesting is that each puzzle in arc is novel,
is something that you've probably not encountered before,
even if you've memorized the entire internet.

> Dwarkesh Podcast 的更多单集

Francois Chollet — Why the biggest AI models can't solve simple puzzles

弗朗索瓦·科尔莱特 —— 为什么最大的AI模型无法解决简单的谜题

Dwarkesh Podcast

单集简介 ...

单集文稿 ...