Learning Library

← Back to Library

AI Agents: Adoption Gap and Debate

Key Points

  • The adoption of AI agents follows a steep power‑law curve, creating a stark divide between early, “super‑adopter” organizations and the broader market.
  • A current high‑profile dispute pits Anthropic’s multi‑agent Deep Research system against the Devon team’s single‑agent stance, highlighting divergent views on architectural complexity and production viability.
  • Anthropic argues that multi‑agent setups, while more complex, leverage talent and massive token consumption to achieve higher solution correctness, especially for difficult tasks.
  • Insufficient token budgets can masquerade as “reasoning decay,” as illustrated by critiques of an Apple paper that failed to allocate enough output tokens for solving complex puzzles like the Tower of Hanoi.
  • Overall, the debate underscores that effective AI agent deployment in 2025 hinges on balancing architectural choices, compute resources, and skilled implementation.

Full Transcript

# AI Agents: Adoption Gap and Debate **Source:** [https://www.youtube.com/watch?v=uX536ECdb94](https://www.youtube.com/watch?v=uX536ECdb94) **Duration:** 00:12:16 ## Summary - The adoption of AI agents follows a steep power‑law curve, creating a stark divide between early, “super‑adopter” organizations and the broader market. - A current high‑profile dispute pits Anthropic’s multi‑agent Deep Research system against the Devon team’s single‑agent stance, highlighting divergent views on architectural complexity and production viability. - Anthropic argues that multi‑agent setups, while more complex, leverage talent and massive token consumption to achieve higher solution correctness, especially for difficult tasks. - Insufficient token budgets can masquerade as “reasoning decay,” as illustrated by critiques of an Apple paper that failed to allocate enough output tokens for solving complex puzzles like the Tower of Hanoi. - Overall, the debate underscores that effective AI agent deployment in 2025 hinges on balancing architectural choices, compute resources, and skilled implementation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=uX536ECdb94&t=0s) **Power Law of AI Agent Adoption** - The speaker outlines key strategic levers for AI agents in 2025, highlights the steep adoption gap following a power‑law distribution, and uses the recent Anthropic‑Devon debate over single versus multi‑agent architectures as a concrete illustration. - [00:04:17](https://www.youtube.com/watch?v=uX536ECdb94&t=257s) **Memory Architecture as Strategic Lever** - The speaker asserts that correctly designing memory architecture and context engineering is the key strategic decision for effective AI agents, and sharply critiques a recent McKinsey deck for recommending outdated models such as Claude Haiku instead of current state‑of‑the‑art options. - [00:07:32](https://www.youtube.com/watch?v=uX536ECdb94&t=452s) **Debating Multi-Agent vs Single-Agent** - The speaker critiques buzzword‑laden CEO pitches, contrasts the complexity of multi‑agent systems with single‑agent solutions, and emphasizes understanding core agent fundamentals amid industry hype. - [00:11:18](https://www.youtube.com/watch?v=uX536ECdb94&t=678s) **Beware the AI Agent Hype** - The speaker apologizes to those misled by flashy AI agent pitches, stresses the real difficulty and ROI challenges, and advises considering simpler turnkey solutions like Lindy.ai instead of over‑complicating implementations. ## Full Transcript
0:00This video is for everyone who has 0:02wondered what is an AI agent and how do 0:04I get on track with AI agents in 2025. 0:07I'm going to unfold for you what I think 0:09are the most important strategic levers 0:11and then I've written about a 30-page 0:13Substack article that will sort of dive 0:15farther into it. We're obviously not 0:17going to just read 30 pages here. So, 0:20the key thing I want to call out is that 0:22the gap in adoption for agents follows a 0:26power law. I'm going to give you two 0:28examples from this past week that 0:30illustrate that power law in practice 0:32and show you how big the gap is on real 0:35AI agent debate today and how split the 0:38market is. On the far right side, like 0:41way over on the super adopter side, we 0:43have a widely publicized spat between 0:47the anthropic team that built deep 0:50research for anthropic and the team that 0:52built Devon. They both published waring 0:54papers where the team that built Devon 0:57put a flag in the ground and said you 0:59should only build single agent 1:01architectures. You should not have 1:02multi-agent architectures. Now, if 1:04you're wondering, hey, I'm over my head. 1:06An agent is an LLM plus tools plus 1:09policy guidance. That's it. And 1:11basically Devon Devon's team is saying 1:14you just need one agent stack. You don't 1:17need to have multiple agents working on 1:18the problem. That introduces so much 1:20complexity. you can't maintain good 1:22production deployment standards. Then 1:25Claude snaps back and says, "We built 1:27deep research as multi- aent and it's 1:29vastly more effective as multi-agent." 1:32Uh, and the dunking goes back and forth 1:34and it's over most people's heads. Now, 1:36by the way, if you're wondering how did 1:38they get multi-agent to work, obviously 1:40one of the reasons is this is the 1:41anthropic team and they're really, 1:42really good. One of the things that 1:45makes multi- aent implementation 1:47successful is talent. So, let's assume 1:49they have the talent. they've got the 1:51talent in that world. They admit that 1:53part of why multi-agent systems work is 1:56that you are trying to burn compute 1:59because you value the correctness of the 2:02solution. And multi-agent systems are an 2:04efficient way to burn up enough tokens 2:07to get to correctness. Now, if you're 2:09scratching your head about that, 2:10fundamentally, if you don't burn enough 2:12tokens, you're unlikely to get to the 2:14correct solution with a large language 2:16model. it outputs tokens and if it 2:18doesn't output enough of them it often 2:19doesn't reach the solution. In fact, 2:20sometimes it's not possible. One of the 2:22gaps that emerged in the Apple paper 2:25that everyone was talking about that 2:26said reasoning is dead is that they 2:29didn't allocate enough output tokens to 2:31make determining the solution 2:33computationally possible for at least 2:35one of their major puzzles that they 2:36claimed showed reasoning decay. And I 2:39know this is a bit of an aside, but if 2:42you were going to claim that you showed 2:44reasoning decay and you didn't give the 2:46system enough tokens to computationally 2:48compute computationally compute to 2:51correctly compute a long computation for 2:53a complex game like Hanoi Tower. 2:56Maybe you're being malicious. Maybe 2:59you're being incompetent. Maybe you just 3:01missed a trick. I would like to hope the 3:03third. I want to hope the best of 3:04people. But it's really bad. It's 3:06especially embarrassing for them because 3:08that mistake was pointed out by a large 3:10language model. Claude Opus was listed 3:13slightly as a joke, but not really 3:14because it helped as a co-author for the 3:17paper that debunked the Apple paper over 3:19the weekend. So, where why am I going 3:21along with this? One, it illustrates how 3:23far out on the right side of the power 3:25law distribution these debates are. But 3:27two, it emphasizes the fact that 3:30computational tokens matter in solving 3:32problems. And the debate between Claude 3:35and the team at Devon was about 3:39computational tokens. The team at 3:41Cognition, they're the ones that built 3:42Devon. Ultimately, what Anthropic is 3:45saying and what we need to take away if 3:46we're not far enough over on the right 3:48side of the distribution, if we're not 3:50the top 1% at AI agent building, which 3:52most of us aren't, 3:54think about how much tokens you need to 3:56build correct solutions. Think about the 3:58value of those correct solutions. That 4:01actually matters a lot. I get into a lot 4:03of other factors in the report that I do 4:04in the substack. I talk about 4:06statefulness. I talk about memory. I 4:08talk about how you design hierarchical 4:10solutions if you use multi-agent 4:12architectures. Give you some diagrams to 4:14think through some of the basic types. 4:15All of that stuff. For our purposes 4:17here, the strategic lever is really how 4:19you handle your memory architecture. 4:21Making that decision determines 4:23everything else. 4:26And that makes a lot of sense, right? 4:28Because if you think about your memory 4:30and how you access your memory and 4:31design that correctly from the start, 4:33you're doing the context engineering, 4:35which is the key word I think we're 4:37missing in this whole debate that will 4:39enable you to shape the instruction 4:42sets, the policies, the guidance, the 4:44substrate of context the agents operate 4:46on so that they can be effective at 4:48their work. And if that sounds over your 4:50head, that's great. It's over the head 4:53of 95% of people, but it illustrates 4:56that that's where like a part of the 4:58industry is spending time pulling out 4:59its hair and fighting back and forth 5:01while the rest of us, and this has all 5:03happened in the same week, while the 5:05rest of us are hearing about the 5:07McKenzie deck on AI agents. It's a 5:10terrible deck, guys. I I don't know how 5:14else to put it. I'm not going to sort of 5:16I'm not going to rubber uh I I'm not 5:18going to sugarcoat it. This deck 5:21recommends models that are years old 5:23like Haiku. No one has haiku anymore in 5:25Claude. We use Sonnet 4. We use Opus 4. 5:28Why on earth you would be recommending 5:30two CEOs presumably being paid money to 5:33recommend two CEOs? Claude Haiku is 5:36beyond me. It's absolutely ridiculous. 5:39I'll give you some more examples as 5:41well. What they call state-of-the-art 5:43Llama 38B Gemini Nano Mistral Small. 5:48This is embarrassing. They're talking 5:50about basically GP2 era stuff. GPT2 era 5:55stuff and we're living in a GPT 4 almost 5:58GPT 5 era world. It just looks like 6:01McKenzie's technical teams do not track 6:03actual model releases or they think 6:05their CEOs are dumb and don't 6:08understand. They also are addicted to 6:10buzzwords and this is also very 6:12dangerous. They talk about the agentic 6:14AI mesh. Talk to a developer. Ask them 6:16what an agentic AI mesh is. This is 6:20buzzword thinking. There is no technical 6:23understanding here. You can rebrand a 6:25concept, but that doesn't make it actual 6:27innovation and that doesn't mean that 6:29it's actually useful for decision-m. And 6:31I think one of the things that I want to 6:33call out is like what I have described 6:34is like understanding your token burn, 6:36understand the value of correctness, 6:38understand the importance of 6:41statefulness in designing memory and 6:43context. Those are highly relevant, 6:46highly actionable things you can do that 6:48help you make decisions like get into 6:50how you follow that flowchart in the 6:52sort of the gigantic 30page tome if you 6:54want to read it and I don't care if you 6:56don't. Uh the the point here is that if 7:01you don't think at that level of detail, 7:04you are likely to be among the 7:05businesses tossing out your AI agent 7:07initiative this coming year. the 7:09businesses that trust McKenzie and try 7:12and build AI agentic mesh are going to 7:14be tossing out their AI investments 7:16because it it it doesn't specify 7:19messaging protocols. It doesn't specify 7:21state management schemas, error handling 7:23patterns, and and I get it. Not all CEOs 7:25operate at that level of detail. You can 7:28still communicate the strategic levers. 7:31You can talk about the importance of 7:32designing context. You can talk about 7:35the importance of distributed 7:36coordination, the relative cost of 7:38multi- aent versus single agent 7:39architectures. These are things a CEO is 7:42paid to understand. They can figure it 7:45out. You can talk about interface 7:47standardization in a way that actually 7:49respects the fact that there are 7:52genuinely different protocols under the 7:53surface. It almost reads like McKenzie 7:56is just saying what they think the CEOs 7:58want to hear and throwing some technical 8:00buzzwords from 2022, 8:022023 into the dock and they think 8:05they're going to get away with it. And I 8:07call that out because it's incredible to 8:09me that the McKenzie deck is happening 8:11in the same week that we have this deep 8:15fight between Cognition, 8:18the designers of Devon who advocate the 8:20single agent solution and Enthropic 8:22advocating a multi-agent solution just a 8:25couple of days apart. That is how wide 8:28the distribution on agents is. I get it. 8:30You don't have to be an anthropic agent 8:33engineer. I'm not an anthropic agent 8:35engineer. I don't work for cognition. 8:37I'm not saying I'm that good. But you 8:39also don't have to fall for Mckenzie 8:41decks. You can understand the 8:44fundamentals of agents. They're tools. 8:46They're guidance and they're an LLM in a 8:48trench coat if you like. You can 8:50understand the concept. It's pretty 8:52intuitive that a multi- aent system is 8:54inherently more complex to implement 8:57than a single agent stack. Seems kind of 8:59intuitive. It's a little bit of the 9:00heart of what Cognition was getting at. 9:02You can understand the idea of Anthropic 9:05firing back and saying it is difficult 9:08to get real token burn on huge problems 9:11out of a single agent. A single LLM plus 9:14tools plus guidance and we need token 9:17burn for big problems. Remember 9:19Anthropic is focused on building LLMs 9:21that reason correctly over long periods 9:24of time. It would make sense that they 9:26would care about correctness and they 9:28would care about burning enough tokens 9:30to go for hours. You'll recall they 9:32deliberately advertised a 7-hour runtime 9:35for I think it was Opus 4 when they 9:38released it just a couple of weeks ago. 9:40They are working on lengthening the 9:42horizon of Agentic 9:44uh endeavor, right? Lengthening the 9:46horizon of Agentic development endeavor 9:48and coding endeavor specifically. That's 9:50a multi-agent solution and that's what 9:52they're saying. So they have done the 9:53work to understand that multi-agent is 9:55worth it. They have the talent and 9:56they're implementing it. Well, your 9:58question if you're looking at the 9:59agentic world is, am I a single agent 10:01person? Am I a multi- aent person? 10:03Honestly, do I need agents off the 10:05shelf? I write about some of those in 10:06the Substack because increasingly there 10:08is so much hype in this space. Anytime 10:10you get into a world where there's going 10:11to be a $200 billion industry in a few 10:13years, you're going to have hype. Great. 10:16Let the hype work for you. If there's 10:17point solutions like Zenesk that support 10:20you, great. Take them. Just think about 10:22what you're doing and think about your 10:24success and think about how you measure 10:26it because the last thing I will tell 10:27you before I hop off here and you can 10:29jump into the article if you want or 10:31not. The last thing I will tell you is 10:33that eval understanding quality are the 10:36thing that separates a successful 10:38implementation from not. You can get all 10:40the other stuff right. You can get the 10:41decision to buy versus built, the single 10:43agent versus multi-agent stack, the 10:45statefulness, the memory, this and that, 10:46token burn, this and that. If you don't 10:48have evals, if you can't measure 10:50correctly, measure model drift, measure 10:52what you need to understand how those 10:54agents are performing in production, you 10:57are in big trouble. The agent will not 11:01last and you will be once again with the 11:04rest of the companies dumping dumping 11:05their agents out. I am making this video 11:08because I do not want you to be in that 11:09position. An alarming number of 11:12companies are reconsidering Agentic 11:14investments right now with good reason. 11:16They've been fooled by decks like the 11:18McKenzie one. They have understandably 11:20spent a lot of money and they are 11:22understandably frustrated. And it is not 11:24their fault. If this is you, it is not 11:26your fault. If you read decks like 11:28McKenzie, if you were told that this 11:31would be easy and it isn't, I am sorry. 11:35I am here to try and tell you how hard 11:38it actually is so that you can 11:39understand if you need to invest in it 11:41or if you just want a turnkey solution 11:43like lindy.ai, be done with it. Both of 11:46those are okay. You don't get extra 11:48credit points for implementing a f a 11:50fancier agent. All you get is more pain 11:52and you better be sure the ROI is there. 11:54I could go on about agents for a very 11:56long time, but I'm going to stop it 11:58right there and we can always have that 12:00conversation again. There you go. That 12:02is what has happened in agents. Agents 12:04are the buzzword for 2025. I think it's 12:07worth revisiting that story midway 12:08through the year. The hype has been off 12:10the chain and in many ways it's led 12:12businesses astray.