AI Agents: Modeling Beats Doing
Key Points
- The current focus on AI agents as executors—writing emails, handling tickets, generating code—is a low‑leverage opportunity compared to using agents as models.
- High‑leverage value comes from “modeling agents,” where AI agents simulate realities (digital twins) rather than merely performing tasks, unlocking exponential productivity gains.
- Traditional agents combine an LLM core, tool access, and policy guidance to automate work, and their success metrics (tickets closed, hours saved, cost per interaction) reflect this execution‑oriented approach.
- A quieter industry shift, highlighted by Nvidia’s warehouse‑twin demonstration, shows companies leveraging agents as simulators to create digital twins that model complex environments for long‑term strategic advantage.
- To transform agents into modelers, you add a simulated world layer on top of the LLM‑tools‑guidance stack, enabling agents to act as reality simulators and deliver the next trillion‑dollar edge.
Sections
- Modeling Over Execution: AI Agent Leverage - The speaker contends that the current focus on AI agents as task‑performing executors is low‑leverage, and the next trillion‑dollar breakthrough will come from using AI to model and simulate agents, a far more exponential opportunity.
- Reality Simulators vs Execution Agents - The speaker explains how LLM‑driven agents can serve as reality simulators—modeling constraints and scenarios like stakeholder negotiations or business timelines—to improve decision‑making, contrasting this with simpler agents that merely automate linear tasks.
- Simulation‑Based Time Compression - The speaker explains how using fast virtual simulations lets companies iterate far ahead of real‑world time, accelerating development despite imperfect accuracy, with examples like robotics and Tesla’s autonomous‑driving training.
- Avoiding False Confidence in Digital Twins - The speaker addresses common objections to digital‑twin modeling—garbage‑in/garbage‑out, lack of calibration, and overreliance on point forecasts—advocating rigorous back‑testing, constraint checks, and using bounded distributions to maintain honest, reliable simulations.
- Tool Stacks, Simulated Relationships, Ethics - The speaker contrasts enterprise versus lightweight tool stacks for relationship simulations, stresses the need for fresh data and feedback loops, and argues that having powerful compute creates a moral duty to use it responsibly.
Full Transcript
# AI Agents: Modeling Beats Doing **Source:** [https://www.youtube.com/watch?v=duA2AwL7keg](https://www.youtube.com/watch?v=duA2AwL7keg) **Duration:** 00:15:34 ## Summary - The current focus on AI agents as executors—writing emails, handling tickets, generating code—is a low‑leverage opportunity compared to using agents as models. - High‑leverage value comes from “modeling agents,” where AI agents simulate realities (digital twins) rather than merely performing tasks, unlocking exponential productivity gains. - Traditional agents combine an LLM core, tool access, and policy guidance to automate work, and their success metrics (tickets closed, hours saved, cost per interaction) reflect this execution‑oriented approach. - A quieter industry shift, highlighted by Nvidia’s warehouse‑twin demonstration, shows companies leveraging agents as simulators to create digital twins that model complex environments for long‑term strategic advantage. - To transform agents into modelers, you add a simulated world layer on top of the LLM‑tools‑guidance stack, enabling agents to act as reality simulators and deliver the next trillion‑dollar edge. ## Sections - [00:00:00](https://www.youtube.com/watch?v=duA2AwL7keg&t=0s) **Modeling Over Execution: AI Agent Leverage** - The speaker contends that the current focus on AI agents as task‑performing executors is low‑leverage, and the next trillion‑dollar breakthrough will come from using AI to model and simulate agents, a far more exponential opportunity. - [00:03:29](https://www.youtube.com/watch?v=duA2AwL7keg&t=209s) **Reality Simulators vs Execution Agents** - The speaker explains how LLM‑driven agents can serve as reality simulators—modeling constraints and scenarios like stakeholder negotiations or business timelines—to improve decision‑making, contrasting this with simpler agents that merely automate linear tasks. - [00:06:38](https://www.youtube.com/watch?v=duA2AwL7keg&t=398s) **Simulation‑Based Time Compression** - The speaker explains how using fast virtual simulations lets companies iterate far ahead of real‑world time, accelerating development despite imperfect accuracy, with examples like robotics and Tesla’s autonomous‑driving training. - [00:10:09](https://www.youtube.com/watch?v=duA2AwL7keg&t=609s) **Avoiding False Confidence in Digital Twins** - The speaker addresses common objections to digital‑twin modeling—garbage‑in/garbage‑out, lack of calibration, and overreliance on point forecasts—advocating rigorous back‑testing, constraint checks, and using bounded distributions to maintain honest, reliable simulations. - [00:13:16](https://www.youtube.com/watch?v=duA2AwL7keg&t=796s) **Tool Stacks, Simulated Relationships, Ethics** - The speaker contrasts enterprise versus lightweight tool stacks for relationship simulations, stresses the need for fresh data and feedback loops, and argues that having powerful compute creates a moral duty to use it responsibly. ## Full Transcript
I think we're focusing on agents at
their most underleveraged point. Let me
explain what I mean. Fundamentally, we
are focused on AI agents as executors.
AI agents as doers writing emails,
answering tickets, codegen demos, and we
are spending ink and we are spending
pixels and we are spending tokens
figuring out as a community how to get
AI agents to do stuff better. That is
the lower leverage opportunity for
Asians and we are almost never talking
about the higher leverage opportunity
and it's being used today by smart
companies. The higher leverage
opportunity is AI modeling agents as AI
models. That is an exponential
opportunity and this video is all about
unpacking the idea that modeling beats
doing. And there's a quiet AI revolution
among companies that have figured that
out. I want to show you why the next
trillion dollar edge is not faster
execution with a agents even though
that's good. It's better simulation with
agents. So the traditional agent
conception is LLMs plus tools plus
guidance. Pretty simple, right? You have
an AI agent that has uh a large language
model at the heart. That's we would call
that the brains.
It can call tools to do tasks and it's
wrapped up by guidance or orchestration
that gives it a policy that tells it
what it should be doing and also
constraints what it should not be doing.
And a lot of our evaluations essentially
measure how do these agents LLM and
tools and guidance in a little trench
code do at getting real work done.
And so the KPIs that we brag about
tickets closed, hours saved, cost per
interaction, those all come from that
idea of agents. agents as doing things
with tools and policy guidance to
constrain them. Networks of agents,
communities of agents, meshes of agents
to use the McKin McKenzie phrase, those
all come from this concept that you need
a swarm of agents or a team of agents
doing work for you. That's great for
automation. That's great for execution.
Let's zoom out to the wider opportunity.
Agents can be reality simulators.
So the concept of a digital twin is
something that was actually first
brought out sort of and shown off in
public earlier this year back in January
when Nvidia launched manufacturing
warehouse twins. This was in the same
conference where CEO of Nvidia
announced that, you know, Jensen is is
gonna say this is the year of AI agents,
right? And we knew that the VC hype was
big for AI agents. Jensen coming out at
the beginning of the year in January
with a whole AI agent demo. People kind
of slept on the warehouse part. People
forgot that the idea Jensen had was that
digital twins matter profoundly for
long-term productivity and for
maximizing the lever of AI leverage of
AI agents. So just as we defined AI
agents uh that are doers as LLMs and
tools and guidance, I'm going to tell
you that if you want to use agents as
modelers,
you add one thing more. You have agents
that are LLMs with tools and guidance in
a simulated world.
That's the last part. And that's why the
simulation matters so much with the
warehouse that Jensen introduced. Every
other example we have of model uh
building simulates the world. Now, it
might not be like a 3D video game world
simulation. It might be a simulation
that models the relevant constraints of
the world in text in words. That can
happen too. And all you have to do, like
we we do this all the time, there are
prompts that will set up your LLM to act
as an agent within a reality simulator.
And all you're doing is telling the
agent to act in a certain way with this
policy and guidance given these
constraints around the world. And so
when we talk about, hey, help me game
out this situation with a difficult
stakeholder. And people are having those
conversations with their uh LLMs.
They're having those conversations with
Chad GPT. They're talking about breaking
up with their ex and they're simulating
that conversation with Chad GB to see
how it goes. That is agents as reality
simulators.
Here's why it matters that we talk about
this.
We are spending most of our time talking
about agents that execute. Those are
linear time savings agents. They turn a
10-minute email into a zero minute
email.
which is fantastic.
Imagine the difference when you have a
reality simulator agent that helps you
improve your decisioning as a business.
Imagine an agent that allows you to
simulate various business timelines and
explore them. We often only have the
chance for a simple PowerPoint
presentation to the board with three
options and here's our preferred one.
AI gives us so much more power to work
with and almost none of us are using
these agents as reality simulators to
think through different timelines in a
structured way. In that world, if we did
a little structured timeline exploration
for a business, we could turn a 10-year
market cycle into a 10-hour sim and come
back with five or six different 10-hour
sims and have a much more useful
understanding of where the business was
going.
In a sense, we're taking all of these
timelines that we've had to historically
just look the next two or three steps
on. We now have the compute to simulate
a bunch of those different lines, bring
them in and make smarter decisions. If
that improves our decision-m as humans
just a little bit, it will more than
make up for the impact of all of the LLM
agents that focus on execution.
So, what are these exponential value
levers? How do you know if you're doing
this right? One, I talked about
timelines. There's a huge alternate
timeline advantage. You can run and
simulate all kinds of different options,
including not just for the business as a
whole, but for particular scenarios. You
can simulate customer response to
product launches. You can simulate
marketing campaign universes before you
spend a buck. You can ship test all
kinds of code permutations before you
actually ship the code.
Time compression is the second one I
want to call out. So time compression is
the idea that your competitor is on
iteration three but you're on iteration
300 because you are not on wall clock
time. You are on simulation time and
you're able to simulate things so
quickly and discard them. Now, I'm going
to get objections for sure, right?
People are going to say, "Well, these
simulations are not all accurate. So,
why would we believe this alternate
timeline or why would we believe this
time compression concept?" Well, one,
it's being used by some of the biggest
companies in the world already to
deliver extraordinary value, and I'll
get to that. But but two, even if it's
not perfectly accurate, if it's
significantly better than the option of
not thinking about it at all, great. It
can be 70% accurate and still be
extremely useful.
And yes, there are companies that are
using virtual simulations to
dramatically accelerate progression.
Robotics is a good example. Robots are
learning to walk without ever walking by
being trained in virtual environments
first where they can be trained
extremely quickly.
That saves the company a ton of time on
training costs.
Another example is Tesla and driving.
Tesla trains driving AI on simulated
courses
and it helps because the car can have
all of the edge case experiences without
getting into very expensive accidents.
Okay, so we talked about value levers
like alternate timeline, time
compression. There's one more I want to
call out before we get to the real world
here. Uh compounding is a big one. Every
time you sim, you develop better priors.
When you develop better priors, you get
to nonlinear breakthroughs more easily.
You can find pricing cliffs. You can
find hidden segments. You can find
breakthrough products. Things that you
will not get with the smartest executing
agents in the world. What I'm really
trying to get you to take away from this
is that you are on a linear value scale
with AI agents as executors and you are
on a nonlinear value scale with AI
agents as model simulators.
Let's get to a couple examples. Uh, and
these are all vehicle examples. We're
going to do some some cars this time.
That doesn't mean this is the only place
this is happening, but I think it's
useful. With Renault, they cut vehicle
dev time by 60% by having digital twins.
The digital twin predicts crash outcomes
pre-prototype, which really helps them
to develop the car appropriately. BMW
built a virtual factory with thousands
of line change permutations overnight to
simulate the best factory outcomes.
Formula 1 has real-time pit strategy
simulations that helps figure out what
is the most efficient way to allocate
energy in a pit crew change so that you
can get that car back on the raceourse
as quickly as possible. And one example
that isn't a car situation ad networks
can pre-est creative mixes for rorowaz
uplift without spend. When you talk
about sort of the idea of like a viral
simulator and there are apps now that do
this. What it's essentially doing is AI
agents as world models. It's giving an
an LLM or another machine learning
algorithm a set of constraints, a set of
tools, and a world to operate within.
And it's asking it to come back with a
response after it's modeled that world.
Okay, I anticipate I'm going to get more
objections. So, we're just going to be
real honest about those objections.
Garbage in, garbage out is the first
one, right? If you put garbage in,
you're going to get a bad simulation out
and it's a waste of time. That's true.
Maybe put in some proven calibration
loops and calibrate what you put in.
Maybe take pay attention. This is very
controllable. And make sure that you
back twist back test and keep yourself
honest relative to performance. So if
your digital twin is simulating a
timeline and you're actually running
that timeline on wall clock speed and
you see that things are significantly
diverging versus the scenario, be
honest. Assess what went wrong with your
simulation. You usually missed a
constraint when you were projecting for
the board and go back and fix it.
Another push back. This gives you false
confidence.
Fair. I think we had false confidence
when we didn't consider our options
before, too. You should use your
simulations to bound distributions, not
run point projections. Does that make
sense? You have distributions of
timelines. you should be putting some
constraints around them because you had
a scenario that modeled out what was
likely to happen. You don't want to make
a point assumption. That's always been a
weakness for humans is we overfixate on
a particular point assumption and we
don't think about the world as a series
of distributions.
Another objection, compute is super
pricey. How can we afford this? Well,
how can you not afford it? If it gives
you breakthrough potential, seems like
it would be worth it, right?
I want to call out a fourth one.
Culture change is hard.
If we actually give people bonuses, if
we give them rewards for decision
quality, if we give people rewards for
avoiding disaster, not just building
something new, we are going to change
corporate incentives. I know that is a
hard one. I have no illusions. I've
worked in the corporate world long
enough to know that that there there's
not a lot of companies that do that. But
we have an opportunity to rethink how we
do decision making to rethink how we do
agentic
utility in the business and we can bring
compute into our decision-m and future
forward thinking in a way we have never
been able to do it before. I think that
does imply culture change. I think it
implies thinking more about how we
think, how we make decisions, thinking
more about avoiding disasters. So,
you're like, "Okay, this is a lot. How
do I get started?" Well, let me suggest
picking one KPI to try and twin first.
Something you think you know well enough
that you can model, whether it's
literally modeling with a long prompt in
Chad GPT or building something custom.
Maybe it's cost of acquisition, maybe
it's churn, I don't know. Then you want
to make sure you understand the data
that you're feeding it. You want to
understand how you refresh that data and
you understand the feedback loops.
Finally, you want to make sure that you
have a tool stack that is dependable and
solid. Now, if it's a big company
effort, you may have a data lake with a
lakehouse and a feature store and a
simulation engine and a dashboard. That
would be an example of an enterprise
stack. If it's very small and you're
trying to simulate breaking up with your
ex or your soon-to-be ex, it's not
nearly that fancy. You just have to have
good data. You have to have a refresh
cadence as you have that next date with
the person that you're considering
breaking up with and good feedback
loops. And so I intentionally use a
slightly humorous uh take from our
personal lives because one, we do talk
with Chad GVT about our personal lives
and two, I think it helps make it
tangible. Fundamentally, if you want to
simulate a relationship, you have to
give enough information about that
relationship to make it a useful
simulation. And then you have to change
and update your priors for that agent to
understand
how it needs to adjust as reality
continues to evolve.
So the thing that I want to leave you
with is this.
If we have the capability
to have clearer foresight and we choose
not to use it, does this raise our moral
responsibility,
are we more responsible for future
timelines
because we have the compute to think
about agents as worldbuilders? I think
we are. I think we have a responsibility
to think more deeply because we now have
the compute to do so.
And I want to call out again, there is a
massive divergence curve opportunity
here. If everyone else is obsessing
about agents as doers and you are the
one thinking about agents as ways to
model future realities and make better
decision-making, you are playing a
different game and you are a first mover
in that game.
So stop asking how can AI do this task
or I'm not going to say stop. AI is
tremendously valuable as an exeutor, but
95% of what I see is that start asking
how AI can show you different kinds of
futures and help you improve your
decision making.
Where would a digital twin save you from
your next big mistake?
That's my question for you.
Enjoy.