Learning Library

← Back to Library

AI Agents: Modeling Beats Doing

Key Points

  • The current focus on AI agents as executors—writing emails, handling tickets, generating code—is a low‑leverage opportunity compared to using agents as models.
  • High‑leverage value comes from “modeling agents,” where AI agents simulate realities (digital twins) rather than merely performing tasks, unlocking exponential productivity gains.
  • Traditional agents combine an LLM core, tool access, and policy guidance to automate work, and their success metrics (tickets closed, hours saved, cost per interaction) reflect this execution‑oriented approach.
  • A quieter industry shift, highlighted by Nvidia’s warehouse‑twin demonstration, shows companies leveraging agents as simulators to create digital twins that model complex environments for long‑term strategic advantage.
  • To transform agents into modelers, you add a simulated world layer on top of the LLM‑tools‑guidance stack, enabling agents to act as reality simulators and deliver the next trillion‑dollar edge.

Sections

Full Transcript

# AI Agents: Modeling Beats Doing **Source:** [https://www.youtube.com/watch?v=duA2AwL7keg](https://www.youtube.com/watch?v=duA2AwL7keg) **Duration:** 00:15:34 ## Summary - The current focus on AI agents as executors—writing emails, handling tickets, generating code—is a low‑leverage opportunity compared to using agents as models. - High‑leverage value comes from “modeling agents,” where AI agents simulate realities (digital twins) rather than merely performing tasks, unlocking exponential productivity gains. - Traditional agents combine an LLM core, tool access, and policy guidance to automate work, and their success metrics (tickets closed, hours saved, cost per interaction) reflect this execution‑oriented approach. - A quieter industry shift, highlighted by Nvidia’s warehouse‑twin demonstration, shows companies leveraging agents as simulators to create digital twins that model complex environments for long‑term strategic advantage. - To transform agents into modelers, you add a simulated world layer on top of the LLM‑tools‑guidance stack, enabling agents to act as reality simulators and deliver the next trillion‑dollar edge. ## Sections - [00:00:00](https://www.youtube.com/watch?v=duA2AwL7keg&t=0s) **Modeling Over Execution: AI Agent Leverage** - The speaker contends that the current focus on AI agents as task‑performing executors is low‑leverage, and the next trillion‑dollar breakthrough will come from using AI to model and simulate agents, a far more exponential opportunity. - [00:03:29](https://www.youtube.com/watch?v=duA2AwL7keg&t=209s) **Reality Simulators vs Execution Agents** - The speaker explains how LLM‑driven agents can serve as reality simulators—modeling constraints and scenarios like stakeholder negotiations or business timelines—to improve decision‑making, contrasting this with simpler agents that merely automate linear tasks. - [00:06:38](https://www.youtube.com/watch?v=duA2AwL7keg&t=398s) **Simulation‑Based Time Compression** - The speaker explains how using fast virtual simulations lets companies iterate far ahead of real‑world time, accelerating development despite imperfect accuracy, with examples like robotics and Tesla’s autonomous‑driving training. - [00:10:09](https://www.youtube.com/watch?v=duA2AwL7keg&t=609s) **Avoiding False Confidence in Digital Twins** - The speaker addresses common objections to digital‑twin modeling—garbage‑in/garbage‑out, lack of calibration, and overreliance on point forecasts—advocating rigorous back‑testing, constraint checks, and using bounded distributions to maintain honest, reliable simulations. - [00:13:16](https://www.youtube.com/watch?v=duA2AwL7keg&t=796s) **Tool Stacks, Simulated Relationships, Ethics** - The speaker contrasts enterprise versus lightweight tool stacks for relationship simulations, stresses the need for fresh data and feedback loops, and argues that having powerful compute creates a moral duty to use it responsibly. ## Full Transcript
0:00I think we're focusing on agents at 0:02their most underleveraged point. Let me 0:05explain what I mean. Fundamentally, we 0:07are focused on AI agents as executors. 0:10AI agents as doers writing emails, 0:14answering tickets, codegen demos, and we 0:16are spending ink and we are spending 0:18pixels and we are spending tokens 0:21figuring out as a community how to get 0:23AI agents to do stuff better. That is 0:26the lower leverage opportunity for 0:28Asians and we are almost never talking 0:31about the higher leverage opportunity 0:33and it's being used today by smart 0:35companies. The higher leverage 0:37opportunity is AI modeling agents as AI 0:41models. That is an exponential 0:42opportunity and this video is all about 0:45unpacking the idea that modeling beats 0:47doing. And there's a quiet AI revolution 0:50among companies that have figured that 0:52out. I want to show you why the next 0:54trillion dollar edge is not faster 0:56execution with a agents even though 0:58that's good. It's better simulation with 1:01agents. So the traditional agent 1:03conception is LLMs plus tools plus 1:08guidance. Pretty simple, right? You have 1:10an AI agent that has uh a large language 1:13model at the heart. That's we would call 1:14that the brains. 1:16It can call tools to do tasks and it's 1:18wrapped up by guidance or orchestration 1:20that gives it a policy that tells it 1:22what it should be doing and also 1:24constraints what it should not be doing. 1:26And a lot of our evaluations essentially 1:28measure how do these agents LLM and 1:30tools and guidance in a little trench 1:32code do at getting real work done. 1:37And so the KPIs that we brag about 1:40tickets closed, hours saved, cost per 1:42interaction, those all come from that 1:44idea of agents. agents as doing things 1:47with tools and policy guidance to 1:49constrain them. Networks of agents, 1:51communities of agents, meshes of agents 1:54to use the McKin McKenzie phrase, those 1:57all come from this concept that you need 1:59a swarm of agents or a team of agents 2:02doing work for you. That's great for 2:05automation. That's great for execution. 2:07Let's zoom out to the wider opportunity. 2:10Agents can be reality simulators. 2:16So the concept of a digital twin is 2:18something that was actually first 2:20brought out sort of and shown off in 2:21public earlier this year back in January 2:25when Nvidia launched manufacturing 2:28warehouse twins. This was in the same 2:31conference where CEO of Nvidia 2:35announced that, you know, Jensen is is 2:38gonna say this is the year of AI agents, 2:40right? And we knew that the VC hype was 2:42big for AI agents. Jensen coming out at 2:45the beginning of the year in January 2:47with a whole AI agent demo. People kind 2:50of slept on the warehouse part. People 2:53forgot that the idea Jensen had was that 2:58digital twins matter profoundly for 3:02long-term productivity and for 3:04maximizing the lever of AI leverage of 3:06AI agents. So just as we defined AI 3:09agents uh that are doers as LLMs and 3:12tools and guidance, I'm going to tell 3:14you that if you want to use agents as 3:17modelers, 3:19you add one thing more. You have agents 3:21that are LLMs with tools and guidance in 3:25a simulated world. 3:28That's the last part. And that's why the 3:29simulation matters so much with the 3:31warehouse that Jensen introduced. Every 3:34other example we have of model uh 3:36building simulates the world. Now, it 3:38might not be like a 3D video game world 3:42simulation. It might be a simulation 3:43that models the relevant constraints of 3:45the world in text in words. That can 3:49happen too. And all you have to do, like 3:52we we do this all the time, there are 3:54prompts that will set up your LLM to act 3:57as an agent within a reality simulator. 4:00And all you're doing is telling the 4:01agent to act in a certain way with this 4:04policy and guidance given these 4:06constraints around the world. And so 4:08when we talk about, hey, help me game 4:11out this situation with a difficult 4:13stakeholder. And people are having those 4:15conversations with their uh LLMs. 4:17They're having those conversations with 4:18Chad GPT. They're talking about breaking 4:21up with their ex and they're simulating 4:22that conversation with Chad GB to see 4:24how it goes. That is agents as reality 4:27simulators. 4:29Here's why it matters that we talk about 4:31this. 4:33We are spending most of our time talking 4:35about agents that execute. Those are 4:37linear time savings agents. They turn a 4:4010-minute email into a zero minute 4:42email. 4:43which is fantastic. 4:46Imagine the difference when you have a 4:48reality simulator agent that helps you 4:50improve your decisioning as a business. 4:54Imagine an agent that allows you to 4:56simulate various business timelines and 4:59explore them. We often only have the 5:01chance for a simple PowerPoint 5:03presentation to the board with three 5:05options and here's our preferred one. 5:08AI gives us so much more power to work 5:11with and almost none of us are using 5:13these agents as reality simulators to 5:15think through different timelines in a 5:17structured way. In that world, if we did 5:20a little structured timeline exploration 5:22for a business, we could turn a 10-year 5:25market cycle into a 10-hour sim and come 5:28back with five or six different 10-hour 5:31sims and have a much more useful 5:33understanding of where the business was 5:34going. 5:36In a sense, we're taking all of these 5:37timelines that we've had to historically 5:39just look the next two or three steps 5:42on. We now have the compute to simulate 5:44a bunch of those different lines, bring 5:46them in and make smarter decisions. If 5:48that improves our decision-m as humans 5:51just a little bit, it will more than 5:53make up for the impact of all of the LLM 5:56agents that focus on execution. 6:00So, what are these exponential value 6:02levers? How do you know if you're doing 6:04this right? One, I talked about 6:07timelines. There's a huge alternate 6:09timeline advantage. You can run and 6:12simulate all kinds of different options, 6:16including not just for the business as a 6:18whole, but for particular scenarios. You 6:20can simulate customer response to 6:21product launches. You can simulate 6:23marketing campaign universes before you 6:26spend a buck. You can ship test all 6:30kinds of code permutations before you 6:33actually ship the code. 6:36Time compression is the second one I 6:38want to call out. So time compression is 6:41the idea that your competitor is on 6:44iteration three but you're on iteration 6:46300 because you are not on wall clock 6:50time. You are on simulation time and 6:53you're able to simulate things so 6:54quickly and discard them. Now, I'm going 6:58to get objections for sure, right? 7:00People are going to say, "Well, these 7:01simulations are not all accurate. So, 7:04why would we believe this alternate 7:05timeline or why would we believe this 7:07time compression concept?" Well, one, 7:10it's being used by some of the biggest 7:11companies in the world already to 7:13deliver extraordinary value, and I'll 7:14get to that. But but two, even if it's 7:19not perfectly accurate, if it's 7:21significantly better than the option of 7:23not thinking about it at all, great. It 7:26can be 70% accurate and still be 7:28extremely useful. 7:30And yes, there are companies that are 7:33using virtual simulations to 7:35dramatically accelerate progression. 7:37Robotics is a good example. Robots are 7:40learning to walk without ever walking by 7:43being trained in virtual environments 7:46first where they can be trained 7:47extremely quickly. 7:50That saves the company a ton of time on 7:52training costs. 7:55Another example is Tesla and driving. 7:58Tesla trains driving AI on simulated 8:02courses 8:05and it helps because the car can have 8:08all of the edge case experiences without 8:12getting into very expensive accidents. 8:16Okay, so we talked about value levers 8:18like alternate timeline, time 8:19compression. There's one more I want to 8:21call out before we get to the real world 8:22here. Uh compounding is a big one. Every 8:26time you sim, you develop better priors. 8:29When you develop better priors, you get 8:31to nonlinear breakthroughs more easily. 8:33You can find pricing cliffs. You can 8:35find hidden segments. You can find 8:36breakthrough products. Things that you 8:38will not get with the smartest executing 8:41agents in the world. What I'm really 8:43trying to get you to take away from this 8:46is that you are on a linear value scale 8:48with AI agents as executors and you are 8:50on a nonlinear value scale with AI 8:53agents as model simulators. 8:56Let's get to a couple examples. Uh, and 8:59these are all vehicle examples. We're 9:01going to do some some cars this time. 9:03That doesn't mean this is the only place 9:04this is happening, but I think it's 9:05useful. With Renault, they cut vehicle 9:08dev time by 60% by having digital twins. 9:13The digital twin predicts crash outcomes 9:15pre-prototype, which really helps them 9:17to develop the car appropriately. BMW 9:20built a virtual factory with thousands 9:22of line change permutations overnight to 9:24simulate the best factory outcomes. 9:27Formula 1 has real-time pit strategy 9:29simulations that helps figure out what 9:32is the most efficient way to allocate 9:34energy in a pit crew change so that you 9:37can get that car back on the raceourse 9:38as quickly as possible. And one example 9:41that isn't a car situation ad networks 9:44can pre-est creative mixes for rorowaz 9:47uplift without spend. When you talk 9:49about sort of the idea of like a viral 9:52simulator and there are apps now that do 9:54this. What it's essentially doing is AI 9:57agents as world models. It's giving an 10:00an LLM or another machine learning 10:02algorithm a set of constraints, a set of 10:05tools, and a world to operate within. 10:07And it's asking it to come back with a 10:09response after it's modeled that world. 10:12Okay, I anticipate I'm going to get more 10:14objections. So, we're just going to be 10:16real honest about those objections. 10:18Garbage in, garbage out is the first 10:20one, right? If you put garbage in, 10:22you're going to get a bad simulation out 10:24and it's a waste of time. That's true. 10:27Maybe put in some proven calibration 10:29loops and calibrate what you put in. 10:31Maybe take pay attention. This is very 10:33controllable. And make sure that you 10:34back twist back test and keep yourself 10:38honest relative to performance. So if 10:40your digital twin is simulating a 10:42timeline and you're actually running 10:43that timeline on wall clock speed and 10:45you see that things are significantly 10:47diverging versus the scenario, be 10:49honest. Assess what went wrong with your 10:51simulation. You usually missed a 10:52constraint when you were projecting for 10:54the board and go back and fix it. 10:57Another push back. This gives you false 10:59confidence. 11:00Fair. I think we had false confidence 11:02when we didn't consider our options 11:04before, too. You should use your 11:06simulations to bound distributions, not 11:10run point projections. Does that make 11:11sense? You have distributions of 11:13timelines. you should be putting some 11:15constraints around them because you had 11:17a scenario that modeled out what was 11:19likely to happen. You don't want to make 11:21a point assumption. That's always been a 11:23weakness for humans is we overfixate on 11:25a particular point assumption and we 11:27don't think about the world as a series 11:29of distributions. 11:32Another objection, compute is super 11:34pricey. How can we afford this? Well, 11:37how can you not afford it? If it gives 11:38you breakthrough potential, seems like 11:40it would be worth it, right? 11:44I want to call out a fourth one. 11:48Culture change is hard. 11:51If we actually give people bonuses, if 11:54we give them rewards for decision 11:58quality, if we give people rewards for 12:00avoiding disaster, not just building 12:02something new, we are going to change 12:05corporate incentives. I know that is a 12:06hard one. I have no illusions. I've 12:08worked in the corporate world long 12:10enough to know that that there there's 12:11not a lot of companies that do that. But 12:15we have an opportunity to rethink how we 12:18do decision making to rethink how we do 12:21agentic 12:23utility in the business and we can bring 12:26compute into our decision-m and future 12:29forward thinking in a way we have never 12:31been able to do it before. I think that 12:33does imply culture change. I think it 12:36implies thinking more about how we 12:38think, how we make decisions, thinking 12:40more about avoiding disasters. So, 12:43you're like, "Okay, this is a lot. How 12:46do I get started?" Well, let me suggest 12:49picking one KPI to try and twin first. 12:53Something you think you know well enough 12:55that you can model, whether it's 12:57literally modeling with a long prompt in 12:59Chad GPT or building something custom. 13:01Maybe it's cost of acquisition, maybe 13:03it's churn, I don't know. Then you want 13:06to make sure you understand the data 13:07that you're feeding it. You want to 13:09understand how you refresh that data and 13:11you understand the feedback loops. 13:14Finally, you want to make sure that you 13:16have a tool stack that is dependable and 13:19solid. Now, if it's a big company 13:21effort, you may have a data lake with a 13:25lakehouse and a feature store and a 13:27simulation engine and a dashboard. That 13:28would be an example of an enterprise 13:30stack. If it's very small and you're 13:32trying to simulate breaking up with your 13:34ex or your soon-to-be ex, it's not 13:37nearly that fancy. You just have to have 13:38good data. You have to have a refresh 13:41cadence as you have that next date with 13:43the person that you're considering 13:44breaking up with and good feedback 13:46loops. And so I intentionally use a 13:48slightly humorous uh take from our 13:50personal lives because one, we do talk 13:52with Chad GVT about our personal lives 13:54and two, I think it helps make it 13:56tangible. Fundamentally, if you want to 13:58simulate a relationship, you have to 14:00give enough information about that 14:01relationship to make it a useful 14:03simulation. And then you have to change 14:05and update your priors for that agent to 14:09understand 14:10how it needs to adjust as reality 14:12continues to evolve. 14:15So the thing that I want to leave you 14:17with is this. 14:20If we have the capability 14:23to have clearer foresight and we choose 14:25not to use it, does this raise our moral 14:28responsibility, 14:30are we more responsible for future 14:32timelines 14:34because we have the compute to think 14:37about agents as worldbuilders? I think 14:39we are. I think we have a responsibility 14:41to think more deeply because we now have 14:43the compute to do so. 14:45And I want to call out again, there is a 14:47massive divergence curve opportunity 14:49here. If everyone else is obsessing 14:51about agents as doers and you are the 14:53one thinking about agents as ways to 14:56model future realities and make better 14:58decision-making, you are playing a 14:59different game and you are a first mover 15:01in that game. 15:03So stop asking how can AI do this task 15:07or I'm not going to say stop. AI is 15:10tremendously valuable as an exeutor, but 15:1295% of what I see is that start asking 15:16how AI can show you different kinds of 15:18futures and help you improve your 15:21decision making. 15:23Where would a digital twin save you from 15:25your next big mistake? 15:28That's my question for you. 15:31Enjoy.