The "confident idiot" problem (News)

自信的傻瓜问题(新闻)

The Changelog: Software Development, Open Source

2025-12-09

7 分钟
PDF

单集简介 ...

Why AI needs hard rules (not vibe checks), what Anthropic's acquisition of Bun's creators tells us about the AI takeover, Jonah Glover couldn't get Claude to recreate Space Jam's 1996 website, Google finally unkills something, and Bazzite is a distro for the next generation of Linux gaming.
更多

单集文稿 ...

  • What up nerds?

  • I'm Jared and this is ChangeLog News for the week of Monday, December 8th, 2025.

  • We are quickly approaching last call for state of the log voicemails.

  • We record the show in a week and we have to give BMC time to make the remixes.

  • So if you're thinking about sending one in and you should, now is the best time.

  • Submit yours today at changelog.fm slash s-o-t-l.

  • Okay, let's get into this week's news.

  • The Confident Idiot Problem.

  • Or, why AI needs hard rules, not vibe checks.

  • If you've been following the, how do we actually use AI in production?

  • Conversation stream,

  • you've probably heard people propose a strategy where one LLM checks another LLM's results.

  • But will that work?

  • Quote, we are told to ask GPT40 to grade GPT3.5.

  • We are told to fix the vibes, but this creates a dangerous circular dependency.

  • If the underlying models suffer from sycophancy, which is agreeing with the user,

  • or hallucination, a judge model often hallucinates a passing grade.

  • We are trying to fix probability with more probability.

  • That is a losing game."

  • One possible way of dealing with these confident idiots we've introduced into our software stacks the last few years is to stop treating agents like magic boxes and start treating them like software,