Francois Chollet — Why the biggest AI models can't solve simple puzzles

弗朗索瓦·科尔莱特 —— 为什么最大的AI模型无法解决简单的谜题

Dwarkesh Podcast

2024-06-12

1 小时 33 分钟
PDF

单集简介 ...

Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they're launching today. I did a bunch of socratic grilling throughout, but Francois’s arguments about why LLMs won’t lead to AGI are very interesting and worth thinking through. It was really fun discussing/debating the cruxes. Enjoy! Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Timestamps (00:00:00) – The ARC benchmark (00:11:10) – Why LLMs struggle with ARC (00:19:00) – Skill vs intelligence (00:27:55) - Do we need “AGI” to automate most jobs? (00:48:28) – Future of AI progress: deep learning + program synthesis (01:00:40) – How Mike Knoop got nerd-sniped by ARC (01:08:37) – Million $ ARC Prize (01:10:33) – Resisting benchmark saturation (01:18:08) – ARC scores on frontier vs open source models (01:26:19) – Possible solutions to ARC Prize Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
更多

单集文稿 ...

  • Okay, today I have the pleasure to speak with Francois Chollet,

  • who is a AI researcher at Google and creator of Keras,

  • and he's launching a prize in collaboration with Mike Canouf,

  • the co-founder of Zapier, who we'll also be talking to in a second,

  • a million-dollar prize to solve the ARC benchmark that he created.

  • So, first question, what is the ARC benchmark, and why do we even need this prize?

  • Why won't the biggest LLM we have in a year be able to just saturate it?

  • Sure.

  • So ARC is intended as a kind of IQ test for machine intelligence.

  • And what makes it different from most LLM benchmarks out there is that it's designed to be resistant to memorization.

  • So if you look at the way LLMs work, they're basically this big interpolative memory.

  • And the way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.

  • And by contrast, arc does not require a lot of knowledge at all.

  • It's designed to only require what's known as core knowledge,

  • which is basic knowledge about things like elementary physics,

  • objectness, counting, that sort of thing.

  • The sort of knowledge that any four-year-old or five-year-old possesses.

  • But what's interesting is that each puzzle in arc is novel,

  • is something that you've probably not encountered before,

  • even if you've memorized the entire internet.