2025-12-09
7 分钟What up nerds?
I'm Jared and this is ChangeLog News for the week of Monday, December 8th, 2025.
We are quickly approaching last call for state of the log voicemails.
We record the show in a week and we have to give BMC time to make the remixes.
So if you're thinking about sending one in and you should, now is the best time.
Submit yours today at changelog.fm slash s-o-t-l.
Okay, let's get into this week's news.
The Confident Idiot Problem.
Or, why AI needs hard rules, not vibe checks.
If you've been following the, how do we actually use AI in production?
Conversation stream,
you've probably heard people propose a strategy where one LLM checks another LLM's results.
But will that work?
Quote, we are told to ask GPT40 to grade GPT3.5.
We are told to fix the vibes, but this creates a dangerous circular dependency.
If the underlying models suffer from sycophancy, which is agreeing with the user,
or hallucination, a judge model often hallucinates a passing grade.
We are trying to fix probability with more probability.
That is a losing game."
One possible way of dealing with these confident idiots we've introduced into our software stacks the last few years is to stop treating agents like magic boxes and start treating them like software,