When AI Agents Misread Intent
Key Points
- AI agents can faithfully execute a vague command but misinterpret the user’s true intent, leading to harmful actions like deleting needed files.
- This “intent‑misreading” issue is now the core challenge of building reliable agents, even though recent advances have improved tool‑calling, orchestration, tracing, and durable execution.
- Large language models excel at generating plausible next‑token text because they are trained for token prediction, making them great at chat‑style Q&A but prone to over‑confidence in autonomous tool use.
- In chat, mistaken outputs are easily corrected, but once an agent is granted access to files, emails, or other tools, errors become irreversible, highlighting the need for robust intent validation.
- Future work on agents must focus on explicitly defining and verifying user intent—beyond just prompt engineering—to create systems that act safely and reliably.
Sections
- Agents Misreading Human Intent - The speaker illustrates how AI agents often confidently execute misunderstood, fuzzy commands—like deleting needed files—exposing a lingering intent‑alignment challenge even as tool‑calling and orchestration technologies improve.
- The Intent Gap in Modern AI Agents - Despite rapid advances in agent tooling and evaluation, the speaker argues that aligning latent intent—distinct from explicit context—remains the core challenge in building reliable, scalable AI agents.
- Active Disambiguation in LLMs - Because everyday language is often ambiguous, effective LLM systems must incorporate a clarification loop that prompts the model to ask targeted questions, thereby reducing uncertainty about objectives before generating responses.
- Multi‑Pass Generation & Context Tradeoffs - The speaker argues that employing multi‑pass token generation with reinforcement learning can improve intent inference, but simply enlarging context windows often degrades performance due to ambiguous signal dilution, highlighting the need for better intent handling when building practical agents.
- Intent‑Driven Execution for Agents - The speaker argues that, like intent‑based DeFi trades, AI systems should separate user intent from tool execution using explicit intent representations and solver mechanisms, allowing safer, testable, and higher‑fidelity agent behavior in ambiguous, high‑stakes environments.
- Designing Intent-Driven Agentic Systems - The speaker emphasizes that designers and engineers must build agents that reliably translate clear intent into executable actions, treating intent as a primary component of agentic system design, with future assistance from model makers.
Full Transcript
# When AI Agents Misread Intent **Source:** [https://www.youtube.com/watch?v=T74uZgfu6mU](https://www.youtube.com/watch?v=T74uZgfu6mU) **Duration:** 00:18:46 ## Summary - AI agents can faithfully execute a vague command but misinterpret the user’s true intent, leading to harmful actions like deleting needed files. - This “intent‑misreading” issue is now the core challenge of building reliable agents, even though recent advances have improved tool‑calling, orchestration, tracing, and durable execution. - Large language models excel at generating plausible next‑token text because they are trained for token prediction, making them great at chat‑style Q&A but prone to over‑confidence in autonomous tool use. - In chat, mistaken outputs are easily corrected, but once an agent is granted access to files, emails, or other tools, errors become irreversible, highlighting the need for robust intent validation. - Future work on agents must focus on explicitly defining and verifying user intent—beyond just prompt engineering—to create systems that act safely and reliably. ## Sections - [00:00:00](https://www.youtube.com/watch?v=T74uZgfu6mU&t=0s) **Agents Misreading Human Intent** - The speaker illustrates how AI agents often confidently execute misunderstood, fuzzy commands—like deleting needed files—exposing a lingering intent‑alignment challenge even as tool‑calling and orchestration technologies improve. - [00:03:16](https://www.youtube.com/watch?v=T74uZgfu6mU&t=196s) **The Intent Gap in Modern AI Agents** - Despite rapid advances in agent tooling and evaluation, the speaker argues that aligning latent intent—distinct from explicit context—remains the core challenge in building reliable, scalable AI agents. - [00:06:55](https://www.youtube.com/watch?v=T74uZgfu6mU&t=415s) **Active Disambiguation in LLMs** - Because everyday language is often ambiguous, effective LLM systems must incorporate a clarification loop that prompts the model to ask targeted questions, thereby reducing uncertainty about objectives before generating responses. - [00:11:13](https://www.youtube.com/watch?v=T74uZgfu6mU&t=673s) **Multi‑Pass Generation & Context Tradeoffs** - The speaker argues that employing multi‑pass token generation with reinforcement learning can improve intent inference, but simply enlarging context windows often degrades performance due to ambiguous signal dilution, highlighting the need for better intent handling when building practical agents. - [00:14:53](https://www.youtube.com/watch?v=T74uZgfu6mU&t=893s) **Intent‑Driven Execution for Agents** - The speaker argues that, like intent‑based DeFi trades, AI systems should separate user intent from tool execution using explicit intent representations and solver mechanisms, allowing safer, testable, and higher‑fidelity agent behavior in ambiguous, high‑stakes environments. - [00:18:19](https://www.youtube.com/watch?v=T74uZgfu6mU&t=1099s) **Designing Intent-Driven Agentic Systems** - The speaker emphasizes that designers and engineers must build agents that reliably translate clear intent into executable actions, treating intent as a primary component of agentic system design, with future assistance from model makers. ## Full Transcript
Just picture this. You tell an AI agent
to clean up the old docs on your laptop.
You've given it access to the folders.
It should be able to do that job well.
But it does exactly what you asked. And
that's the problem. It deletes
duplicates. It organizes. It even writes
a little summary of what it
accomplished. And then you discover it
removed the originals that you actually
needed. The model didn't hallucinate. It
didn't lack context. It did something
even worse than that. And that's what
we're going to talk about today. It took
a fuzzy human request. It guessed a
goal. It committed to it. And it
executed confidently without checking
back. In other words, it misread your
intent. And that is a surprisingly
common issue with models. That feeling
of being smart, of being fast, and of
being subtly wrong is not an edge case
these days. It's actually the center of
the agent problem. And that's why in
late 2025, early 2026, it feels like a
very strange moment. We're finally
getting a lot of the big pieces for
agents under control. We understand a
lot more about tool calling than we did
a year ago. We understand a lot about
agent orchestration. We understand a lot
about tracing, about evaluation
harnesses, about durable execution over
time. And yet, we keep face planting on
intent. We can now build systems that
act, but we have to put a ton of work
into making sure that they can reliably
act with the objective that we set them.
And that is why when we are building our
tool calling system following agents, we
still have to put a ton into getting the
intent defined through the prompt right.
Have you ever stopped and asked yourself
why it's that hard? Here's here's the
root of why we're here. LLMs are
actually incredible at producing
plausible sounding continuations because
that's what they were really trained to
do. They were trained to predict the
next token. And so that training
objective creates a machine that is
really really good at an answershaped
text in a piece of text that sounds like
it should be right. And in pure chat
mode, the world is pretty forgiving of
that. If the model answers the wrong
thing, you just correct it. And in many
cases, it answers the right thing
because the answer shaped text is good
enough. In fact, one of the things that
has surprised me and almost everyone
else in the last two years is that this
whole idea of token generation turns out
to be incredibly practical, incredibly
realistic, incredibly useful at
producing real economic utility. And so
this video is not about challenging
that. We know that this whole idea of
token generation fundamentally works.
What we are asking ourselves is what's
next when it comes to agents? How do we
start to get to intention in ways that
help us to build more reliable agentic
systems? Because in a chat, if the model
answers the wrong thing, well, you just
correct it. The conversation is
inherently reversible. You just yell at
it in the chat. But once you give the
model tools, files, email, calendars,
CRM code, maybe your credit card, the
cost of a wrong guess spikes up real
high. The tool use turns a fluent
completion into a realworld commitment
that the agent has made on your behalf.
In a sense, it is writing to reality,
not just writing to the chat. That is an
inflection point that we're all living
through. And it makes intent and the
intent gap matter a lot more. And
everything else is going so well. People
are no longer handwaving how agents
work. They're actually able to build
them. You can see it in how eval emerged
as a first class discipline over the
last 6 months. You can see it in how
tools like Langchain and Langsmith have
evolved over the last year into full
stack traceable audit ready agent
building toolkits and they're not the
only ones. There's lots like Google has
their ADK. We are getting to a point
where we have so many parts of the
ecosystem in place to deploy agents
reliably, efficiently and at scale. So
why with all that progress are we still
wrestling with intent? Because intent is
not in the text the way context is. And
I'm going to say it again. Intent is not
in the text. Context is the literal
content that we put in when we do
context engineering. Entities,
constraints, instructions, facts that we
include. Intent is typically latent. It
is our priorities. It is our tradeoffs.
It is what done looks like. It's what's
allowed, what's risky, and what to do
when instructions conflict. Whether you
want exploration from your agent or a
decision from your agent, what you'd
regret if the assistant guessed wrong.
By the way, if some of this sounds like
a prompt that you should write, that's a
good instinct in 2026. We need to be
writing prompts for our agents that do
encode these things until we get intent
figured out. We need to be focusing on
making intent not hidden but super super
explicit, including all of those things
that we can typically leave other humans
to infer like our priorities. If we're
in a business meeting and we talk about
priorities, we are typically saying the
thing that needs to be said in the
meeting and then we're typically not
needing to say what is second or third
or fourth priority because everyone in
the room can infer that. That kind of
thing agents are bad at. LLMs are bad at
humans in first off from sparse
information really really reliably.
Effectively, we do a second pass where
we simulate consequences and social
context and then we come back with a
priority list in our heads. It's one of
the things that makes us a little bit
magical. We can hear, for example, make
some quick pasta sauce and we instantly
infer that you're hungry. We we infer
you don't need a lecture, you just want
a a quick snack. We hear clean up the
docks and we can infer don't destroy
anything important to go back to the
example at the beginning of this video.
We can sense invisible guardrails and
LLMs need the guard rails to be visible.
And so a lot of what we've been doing
and talking about when we build a
systems is essentially how you obsess
over those guardrails and make them
visible. Obsess over them and put them
into prompts. Obsess over them and put
them into evals. Take your business
logic and put it into code, not just
into a prompt. and so that it's more
deterministic. All of that is good
stuff. All of that is important stuff to
build agents. But I want to think a
little bit more deeply in this video
about intent itself and how we can start
to solve that problem. Because if you
step back, everything I just talked
about is essentially us working around
the intent problem, not solving it
directly. And a lot of the most useful
research in the last year is basically
saying, stop pretending the model can
read intent straight off the prompt. I'm
glad we've got there. I think I could
have told you that from the beginning of
the year, but it's important that we
understand that so we can take the next
step toward fixing it. I think there may
be a fundamental language mismatch here.
We have built LLMs to do next token
completion on human language, but real
world human language is notoriously
underspecified by default. If you want
reliable outcomes, the system is going
to have to reduce uncertainty about the
objective before it asks. In other
words, it needs active task
disambiguation and human language
optimizes in many cases for social
cohesion and does not optimize for the
kind of over declarative specification
that the model really needs. One of the
directions that researchers are taking
to address this is to formalize that
task ambiguity and treat clarification
as a design problem. You want to get the
model to ask you targeted questions that
maximize information gain and narrow the
space of viable solutions. This is
something that you can start to simulate
with a model when you're trying to
clarify intent. Anyway, it is possible
to bolt on a piece of the prompt and
basically say where you have lack of
clarity, please ask me questions. I've
gotten in the habit of doing this both
with agents and also with chat. With
agents, you have to build in response
sets that help it to clarify the intent
where it gets confused. You have to
build in essentially a clarification
loop into your agentic system. With
chat, it's simpler. You just ask the
agent or you ask the LLM, hey, is there
something that is prompting you to
perform in this way? Can you articulate
your assumptions? And can you please ask
me where you don't understand my intent?
That's usually a very productive line of
questioning to go with. These days, LLMs
do not do that proactively. I suspect by
mid next year, they will. For now, we
have to nudge them to ask questions. A
second line of attack treats intent as
something that is probabilistic. Instead
of asking the system to pick only one
interpretation and roll forward, the
approach essentially maintains a
distribution of plausible goals based on
the text it's received and then updates
it as conversation progresses. You can
actually simulate that one with a chat.
It's a little bit more difficult to
simulate that one with an agentic
system. I don't think you'd really want
to because most agentic systems are
designed to be relatively predictable in
outcomes. And in this case, what you
have here is essentially a progressive
intent classifier where the intent is
crystallized out of a probability
distribution over time. You can, if
you're trying to sharpen your thinking,
simulate this with a good chat, though.
You can talk with an LLM and you can
tell it at the beginning to hold
multiple plausible interpretations of
what you're trying to do so it doesn't
jump to conclusions and you can actually
watch it start to crystallize and infer
as you continue to have a conversation
over time. Ironically, when I was doing
research into intent as a preparation
for this video, I had that kind of a
conversation with SHA GPT 5.2 to
thinking because I was trying to nudge
it to not over infer from one or two
academic white papers and actually think
more broadly. That is something that
that you need to learn to do so that you
are not stuck yelling at your LLM about
hallucinations when really the issue is
an intent. Another approach is to
essentially make this intent a separate
uh document. And that can be very very
helpful in agentic systems because you
can then have what we would call like an
intent commit or a semantic commit that
literally documents the intent as as
crystallized a format as possible. What
are the goals? What are the failure
conditions? What are the graceful fail
conditions? What are the trade-offs that
we make? What are the larger priorities
here? All of that is documented in one
place. If you take that approach, you
end up in a position where you can
actually update your intent separately
from the prompt and you can understand
very clearly where your intent takes the
system and where you can version your
intent over time if you change your
mind, if you want to update the system
and what it does, etc. I think that's
very interesting because it turns intent
into more of an interface and workflow
problem and it doesn't bind us to
figuring out how model makers are going
to solve this. Now, that being said, I
do think that there is a lot of room to
run on reinforcement learning for model
makers in getting better at intent.
Fundamentally, if we figured out models
that can do multiple passes on token
generation and infer, we should be able
to use reinforcement learning techniques
to help them to do second and third
passes across sparse text and infer
context and infer intent more reliably.
I would expect gains there next year.
And by the way, if you're listening to
this and thinking more context will save
us, we'll get a bigger context window.
That can sometimes make things worse.
Even if the user did express the real
priority somewhere models don't robustly
use long context, they still have lost
in the middle challenges. They still
need a lot of good structure, a lot of
good intent to navigate context well.
And long context often embodies
difficult and ambiguous trade-offs that
we don't specify and that we leave the
model to guess. This goes back to the
larger insight. Uh Andre Karpathy called
this out a few weeks ago when when he
observed that humans are very very good
at learning from sparse examples and
models need many many more examples than
humans to learn and tend to generalize
much more poorly than humans do. In this
case adding the context is something
where you would think as a human you
have way more than you need you can
generalize effectively. It sometimes
leads to worse performance because the
signal gets muddled. But let's zoom back
to the practical reality. Builders still
need to learn to ship agents and we
still need to compensate for weak intent
interference at the moment. This is
where I want to lean into the harness
piece. Yes, you should be building
evaluation harnesses. You should be
running agents against curated tasks.
You should be instrumenting your traces.
You should be constraining your tool
permissions. You should not be using too
many tools with an agent. You should
force an agent into a planning state.
You can see that this mindset is
starting to dig in. And I think is
really really important if you want to
build real time productive agents that
scale. Think of it as a kind of
production pragmatism for the first half
of 2026. We can make agents reliable
enough to ship now. We don't have to
have the intent problem fully solved.
Even if it's something that I think we
need to be more aware of and we need to
not pretend is not an issue. I haven't
heard enough conversation about it and
that's why I'm chatting about it in this
video. The reason why I think we're near
a breakthrough is because this is
something that is clearly reinforcement
learning susceptible and it's something
that we have a lot of the pieces of with
inference and LLMs. We have a lot of the
pieces of the Asian ecosystem and if we
get this one piece on intent, it's a
piece of jagged intelligence that
unlocks a real breakthrough for us. And
it's very laborious to work around right
now. Like all of the things I talked
about with harnesses, they're they're
complicated to set up. It would be
really handy if we could reliably trust
an agent to infer intent and call tools
appropriately with a lot less
rigomearroll. We're not there yet, but I
think that the opportunity is too big
for us not to chase it and get to it.
And I suspect that a 2026 breakthrough
is possible. I don't believe even in
2026 that we're going to get to a models
magically understand all intent moment.
I think of it more as an alwayson agent
can routinely run cheap and intermediate
checks automatically in the background
that approximate a human second pass and
only escalate to a user or escalate to a
resolution loop when the uncertainty is
high or it determines the consequences
are very serious or irreversible. That
would simulate intent well enough for us
to be moving on with even as we work on
the larger problem. An interesting way
to see where intent is going is to look
sideways at the crypto community where
intents become a thing for basically the
same reason that agents are difficult.
Intents matter in crypto because actions
are expensive and often irreversible.
We're learning the same thing with LLMs
and agents. Actions are expensive and
often irreversible. So in intentbased
DeFi systems, the user often has to sign
an intent to trade that specifies
constraints and desired outcomes and
then specialized automated solvers will
compete to execute that trade. The whole
design separates what you want from how
it is executed. Look, it's not a perfect
analogy. Crypto has its own issues, but
it's it's a clue in the direction that
we're going, right? When execution is
high stakes with aentic systems, systems
tend to evolve toward explicit intent
representations and solver checker
mechanisms to ensure that that intent is
accurately translated. I think that
we're converging on a similar solution
in the agent world because we need
higher fidelity execution in 2026. So if
you're building systems, I would advise
you start to think about how you can
separate interpretation from execution
in your architecture so that you can
learn to inspect and test the model's
understanding before it touches tools.
Start to think about how to run your
agent against eval suites that include
ambiguous prompts on purpose because the
real world is going to be ambiguous and
you should be grading how the model
reaches the final output and how well it
handles ambiguity along the way. Agent
behavior needs to be evaluated in tool
use in multi-step settings under
controlled conditions. I would also take
adopting a disambiguation mindset
seriously. And you can implement it
relatively simply when an action is
destructive. Have the agent know that
and then trigger a surfaced
interpretation and a clarifying question
if multiple plausible meanings exist.
Look, an example could be if it's
deleting a record in your database,
right? Maybe it needs to surface an
interpretation and a clarifier to
another LLM or if the stakes are high
enough to a human. But the point is to
start to align what we're learning about
the importance of disambiguating intent
with the way we design our agentic
systems. And obviously you're going to
want to do this selectively. You cannot
have the agent ask a question every
breath because then it removes the point
of having the agent at all. You need to
decide where the agent really needs to
get intent right and where your agentic
system alone can't carry that intent
clearly.
And then you're going to find out, okay,
now we need to have a disambiguation
loop here. This is what this loop looks
like. This is why it matters here. This
is why it's worth it. And then before
you forget, make sure you externalize
your intent as an artifact that you can
update. Because the closer you are to
having a sort of living requirements
page or a living intent page, the more
you're going to be able to build
interfaces that actually drive quality
over time because you're going to be
able to say
it's okay if you change your mind. It's
okay if you update your intent. Intent
is a separate artifact in our system and
we can codify it really, really cleanly
and we can plug and play it as we need
to. It opens up a lot of flexibility in
your system design. I think looking
ahead, the winners in designing Agentic
systems are not going to be the ones
that have thousands of tools or the most
tools. They're not going to be the ones
that have put their agents in the most
different places in the business even.
They're going to be the tools and
designers and systems engineers who are
able to reliably design agents that can
carry intent clearly all the way to
executable work. I think we'll get some
help from modelmakers on intent in 2026,
but a lot of it is going to be on us,
the builders. And I hope this video has
given you a sense of how we can start to
design for intent as a first class
object in Agentic Systems. F.