Beyond Hallucinations: AI’s Credibility Overhang
Key Points
- The speaker discusses how early high‑profile AI hallucinations created a credibility gap, leading many people to distrust models like ChatGPT, Claude, and Gemini despite their actual reliability.
- A lower tolerance for errors is applied to AI outputs than to human work, even when AI dramatically speeds up tasks, which fuels the perception that AI must be “perfect.”
- The practical value of AI emerges once its usefulness outweighs the effort needed to verify its answers, indicating that the technology has passed an “event horizon” from experimental to productive.
- While hallucinations must still be mitigated—professionals such as lawyers and doctors need to double‑check AI‑generated information—the current level of AI competence already supports real‑world applications.
- Ongoing discussions about AI hallucinations dominate public conversation, reflecting the lingering credibility overhang that the industry must address.
Sections
- AI Hallucinations and Credibility Gap - The speaker discusses how early high‑profile AI hallucinations have created a lasting mistrust in language models, emphasizing the inflated credibility expectations for AI compared to human errors.
- AI Hallucination Rates Vary By Task - The speaker explains that hallucination frequencies can differ by up to tenfold depending on the prompt and task, and argues that careful, structured prompting and limiting impossible queries are simple best‑practice ways to keep AI-generated hallucinations low.
- Human Stubbornness vs Safer AI - The speaker argues that critics of AI are motivated by personal threat, asserts AI already outperforms humans in reliability, and attributes society’s reluctance to adopt safer technologies—such as autonomous vehicles—to innate human stubbornness.
Full Transcript
# Beyond Hallucinations: AI’s Credibility Overhang **Source:** [https://www.youtube.com/watch?v=0IxUJJCBkPI](https://www.youtube.com/watch?v=0IxUJJCBkPI) **Duration:** 00:09:18 ## Summary - The speaker discusses how early high‑profile AI hallucinations created a credibility gap, leading many people to distrust models like ChatGPT, Claude, and Gemini despite their actual reliability. - A lower tolerance for errors is applied to AI outputs than to human work, even when AI dramatically speeds up tasks, which fuels the perception that AI must be “perfect.” - The practical value of AI emerges once its usefulness outweighs the effort needed to verify its answers, indicating that the technology has passed an “event horizon” from experimental to productive. - While hallucinations must still be mitigated—professionals such as lawyers and doctors need to double‑check AI‑generated information—the current level of AI competence already supports real‑world applications. - Ongoing discussions about AI hallucinations dominate public conversation, reflecting the lingering credibility overhang that the industry must address. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=0s) **AI Hallucinations and Credibility Gap** - The speaker discusses how early high‑profile AI hallucinations have created a lasting mistrust in language models, emphasizing the inflated credibility expectations for AI compared to human errors. - [00:03:40](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=220s) **AI Hallucination Rates Vary By Task** - The speaker explains that hallucination frequencies can differ by up to tenfold depending on the prompt and task, and argues that careful, structured prompting and limiting impossible queries are simple best‑practice ways to keep AI-generated hallucinations low. - [00:07:09](https://www.youtube.com/watch?v=0IxUJJCBkPI&t=429s) **Human Stubbornness vs Safer AI** - The speaker argues that critics of AI are motivated by personal threat, asserts AI already outperforms humans in reliability, and attributes society’s reluctance to adopt safer technologies—such as autonomous vehicles—to innate human stubbornness. ## Full Transcript
I did not want to do this. We are going
to talk about hallucinations. And the
reason we're going to talk about
hallucinations is because I can't get
people to stop talking to me about
hallucinations. So, here we are. We're
doing it. Uh, look, at the end of the
day, the fact that chat GPT was released
when it was at the capability level it
was released at means that we have a
massive overhang of credibility that we
have to make up. Chat GPT is much more
credible than people believe it is.
Claude is much more credible than people
believe it is. It's not really just
about chat GPT, but for the people who
care about this, it's always Chad GPT
because that's the language model they
know. But Gemini, same deal.
What I'm trying to say is when chat GPT
was released back in
2022, there were enough high-profile
hallucinations that people misunderstood
what AI can actually do and chocked it
up to a bunch of lies. And I still hear
that all the time. And I am just it's
like every every day I hear this and I'm
just dealing with it. I'm just going to
talk about it.
What I want to say is that we have a
different bar for AI than we have for
humans. For humans, if I had an
unreliable researcher, frankly, a human
researcher who was an intern, and that
intern took a week to prepare me a
40-page report, and if that intern made
three mistakes in that 40-page report, I
would
say great. Uh, and I would love to use
that report in whatever I'm working
on. If an AI comes back in 30 minutes
with a 40page report and it makes three
mistakes, we say it's it's not good
enough. It needs to be perfect.
Why? It's already cut the time by a
100x. Why does it need to be
perfect? Why does it need to be more
perfect than
people? Now, there's other reasons to
say that herist, you know,
hallucinations are not that big a deal.
Um, but I think that's the most
compelling one to me because if you want
AI to do useful work, then you just have
to believe that the work it can do is
more useful than the time it takes to
check for hallucinations. And we are
well past that bar. Does that mean that
hallucinations don't matter? Does that
mean a lawyer should not be checking
their case by case citations if they're
using AI? Does that mean a doctor
shouldn't be checking the medical
reasoning of an AI? Obviously
not. Obviously, we should be checking
and we should be working to reduce
hallucinations.
Great. But the fact that we are at a
point now where it can clearly and
obviously do useful work means that AI
has crossed the event horizon. It is no
longer just a play thing. is something
we can do work with. And I think
unfortunately that credibility overhang
is biting this industry in the butt
because at the end of the day, most
people who are not sitting in this
YouTube circle, if I talk to them about
AI, hallucinations are the first thing
out of their mouth. It's the first thing
they talk about. Hey, what about
hallucinations? I heard they make stuff
up. I heard it
lies. Honestly, it lies less than the
average human does at this point. Most
of them. The hallucination rate, which
by the way, it's really hard to measure
hallucination rate. I looked into this.
I wrote a Substack about this if you
want to check it out. If you don't, I
don't care. It's a good read, though.
Um, and it goes deep in on what
hallucinations are. And one of the
things that I think is really
interesting is that what we call the
hallucination rate varies by a factor of
10 depending on the task you give it.
The same model can come in at 1 and a.5%
and 15%. And by the way, I'm not making
that up. That's roughly where ChatGpt
4.5 comes in, depending on which
hallucination measure you use. Context
really matters. The kind of task you
give it really matters. One of the
reasons why I don't worry about
hallucinations personally is because I
don't give AI a situation where it is
likely to make up hallucinations and
then blame it. I figure that's
mismanaging my employee. Like, why would
I do that? I don't ask AI to do things
that are virtually impossible unless it
imagines or hallucinates or confabulates
information because that's useless. Why
would I do that? It's such a powerful
tool for what it can do well. Why not
specify your sources where you want it
to go look? Why not be careful in my
prompting and be really clear and
structured? Because it does well when I
do that. That's just easier for me. So,
a lot of these things that actually
reduce
hallucinations, turns out that they're
just best practice for working with AI.
I don't know. Seems like we should
follow best
practice. And so, to me, like our open
AI, our anthropic, are they working on
this? Sure. Is Deep Mind at Google
working on this?
Absolutely. Does that mean that we're
going to have 100% no hallucination
models next year? I guarantee you we
will not. And I also just about
guarantee you it won't matter. It won't
matter for real work. It's going to
matter enormously for public perception
because we are trained to assume that
computers must be perfect because
everything we've had in computers for
100 years, well not 100 years, call it
60 years, has been deterministic
computing.
It has been programs that if a plus b
equals c then whatever right like it's
all mathematics it's
algorithmic everything is determined in
the program when it runs and so we can
expect
perfection and all of our movies say the
same thing. None of us are ready for an
AI where we taught the rocks to think
and they turn out to be poetic dreamers.
We're just not ready for that.
And the fact that the the like AI
doesn't inherently have a factual world
model. The fact that we can talk about a
1.5% error rate in certain hallucination
tests for chat GPT 4.5 is a freaking
miracle. I I am
astonished. These things they dream.
They come up with probabilistic tokens
that they think match what you're
looking for. They have no factual world
model underneath. It's amazing they get
anything right at all. It's kind of
incredible. And so within that
world, yeah, I do think we need to
baseline on humans more. I do think we
need to take seriously the fact that
they do work. And I think that we need
to come up with better answers as an
industry for people who say all it does
is lie. All it does is make stuff up.
It's re and by the way the people who do
that tend to be quite unreliable
narrators themselves. I have never heard
that kind of aggressive contrarian take
from someone who isn't to some degree
personally threatened by AI and needing
to denigrate it. So there is absolutely
a leading edge of change here. People
who are worried about their jobs, people
who are worried about what will happen
to their work are going to be more
likely to denigrate AI. And do I have a
study for that? I will admit frankly I
don't. That is based on me having
conversations with hundreds of people.
It's just something I've
observed. So where does that leave us?
At the end of the day
AI is going to get to a point, in fact
arguably is already crossing the line
where it is more reliable in most fields
than most
humans. At which point we should stop
worrying so much about hallucination for
AI and logically worry about
hallucination for ourselves. And we're
not. And the reason why we're not is
pretty simple. It's the same reason why
Whimo vehicles are not more popular even
though they're vastly safer. It's the
same reason why we haven't outlawed
human driving in the US even though
statistically speaking in US testing,
automated driving is already so much
safer it costs lives to keep human
drivers on the road. And I say the US
because that's where it's been tested.
It's probably true everywhere else in
the world, too. We are a stubborn,
stubborn race. We are a stubborn
species. We do not easily give up on
something we think is true. We think
humans should drive. I do not see that
disappearing anytime soon, even though
that kills
people. We think AI hallucinates. I
don't think that belief is disappearing.
even though it is demonstrabably easily
obviously proved to be an unhelpful
belief. But we have to try we have to
try and explain to people what really
matters here. We have to do our best to
educate. And this is a challenge for all
of us in the industry. And I just I got
so tired of hearing about
hallucinations. I just I wrote a giant
Substack on it. I did this. Like, we've
got to be able to