Learning Library

← Back to Library

AI Agents: Hype vs Reality

Key Points

  • Andrej Karpathy (co‑founder of OpenAI) sparked controversy by claiming that “useful agents are a decade away,” emphasizing current agents’ lack of memory, robustness, and reliability.
  • His perspective comes from leading cutting‑edge AI research (e.g., his recent Nano‑Chat release), which differs from the day‑to‑day experience of builders using off‑the‑shelf tools.
  • He argues that any reliability or robustness we see in multi‑agent systems today is derived from architectural design rather than from the agents themselves.
  • Despite these limitations, agents already deliver strong ROI and real‑world value, so developers can and should build useful applications now.
  • The key takeaway for builders is to focus on solid system architecture to compensate for agent shortcomings rather than waiting for “perfect” agents to arrive.

Sections

Full Transcript

# AI Agents: Hype vs Reality **Source:** [https://www.youtube.com/watch?v=5ioEQigrJOA](https://www.youtube.com/watch?v=5ioEQigrJOA) **Duration:** 00:20:25 ## Summary - Andrej Karpathy (co‑founder of OpenAI) sparked controversy by claiming that “useful agents are a decade away,” emphasizing current agents’ lack of memory, robustness, and reliability. - His perspective comes from leading cutting‑edge AI research (e.g., his recent Nano‑Chat release), which differs from the day‑to‑day experience of builders using off‑the‑shelf tools. - He argues that any reliability or robustness we see in multi‑agent systems today is derived from architectural design rather than from the agents themselves. - Despite these limitations, agents already deliver strong ROI and real‑world value, so developers can and should build useful applications now. - The key takeaway for builders is to focus on solid system architecture to compensate for agent shortcomings rather than waiting for “perfect” agents to arrive. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5ioEQigrJOA&t=0s) **Untitled Section** - - [00:04:05](https://www.youtube.com/watch?v=5ioEQigrJOA&t=245s) **Challenges and Progress in LLM Training** - The speaker discusses the difficulty of supervising large language models—requiring diverse user data, grappling with crude reinforcement‑learning signals and credit‑assignment problems—while acknowledging that despite these hurdles LLMs continue to deliver remarkable results and drive ongoing technological advancement. - [00:08:40](https://www.youtube.com/watch?v=5ioEQigrJOA&t=520s) **Self‑Driving Limits & Incremental AI** - The speaker highlights that despite flashy demos, truly general autonomous vehicles don’t exist yet—city‑specific models are brittle and rollout is incremental—drawing a parallel to AI development’s piece‑by‑piece approach and noting the promise of personalized AI tutors in education. - [00:12:26](https://www.youtube.com/watch?v=5ioEQigrJOA&t=746s) **Embracing Continuity Over AI Panic** - The speaker counters hostile AI narratives by outlining four overlooked insights from Andre’s piece—especially the value of steady, incremental growth and a calm, long‑term approach to solving complex agent problems—advocating continuity instead of disruptive rupture. - [00:16:35](https://www.youtube.com/watch?v=5ioEQigrJOA&t=995s) **Memory Engineering for LLM Agents** - Andre argues that durable, reliable memory is essential for LLM agents to emulate human learning trajectories, and solving this core memory problem will unlock broader AI capabilities, prompting a focus on memory architecture, updates, permissions, and evolutionary analogies. - [00:20:18](https://www.youtube.com/watch?v=5ioEQigrJOA&t=1218s) **Relaxed Announcement About Upcoming Post** - The speaker notes they've completed a full write‑up, advises not to panic, and hints they'll wait for the next Silicon Valley post to go viral. ## Full Transcript
0:00Silicon Valley has been exploding for 0:01days over Andre Carpath's podcast with 0:03Dwarcash. I want to go into why it was 0:05so controversial and now that the dust 0:07has settled, what the real takeaways are 0:10for builders, for people who want to 0:12work with AI in the here and now. So, 0:14the first thing to do is to understand 0:16where Andre is coming from. Andre is one 0:19of the co-founders of Open AI. He's 0:21someone who's been on the cutting edge 0:23of AI systems for a long time. He just 0:25released nano chat which is a new tiny 0:30way to train your own GPT. It's great. 0:33But in that world where he is on the 0:35cutting edge, you have to understand 0:37everything he says from that frame of 0:40reference. And that's going to be 0:41something I come back to because it's 0:42quite a different frame of reference 0:44from being in the trenches as a 0:46practitioner or a builder using existing 0:48AI tools. What did Andre say? So number 0:51one, the first thing he called out is 0:53that useful agents are a decade away. 0:55That was the title of the episode. And 0:57what he's saying essentially is that 0:59current agents lack memory. They lack 1:01robustness and they lack reliability. He 1:04used the word slop and people just 1:07jumped onto that in a way that I think 1:09even he didn't expect. And that is part 1:11of what drove all of this controversy. 1:13But in a way, like if you look at it, 1:15he's right. Agents don't inherently 1:18remember and learn. We have to teach 1:19them everything they know. agents are 1:21not particularly robust. The most 1:23sophisticated multi-agent 1:25implementations I've been a part of tend 1:27to have robustness from architecture 1:29rather than robustness from agents 1:31themselves. And if you want reliability, 1:34again, you go back to architecture for 1:36reliability versus the agent itself. 1:38None of this as a builder, this is me 1:41talking, prevents you having really high 1:43value use cases for agents today. And I 1:46tend to frame the work we have to do 1:48architecturally to make agents work as 1:50just the price that you pay for where 1:51agents are at. And the ROI is there 1:53because agents are able to do so much 1:55already. But the promise of agents has 1:59been much bigger than this. The promise 2:01of agents, which I think Andre is 2:02reacting to, is that they will do 2:04anything, that they will be anywhere, 2:05that they will learn, they will be 2:07useful as they stand out of the box, 2:09that they will remember everything. And 2:11of course, if you've ever worked with an 2:12agent, you know that's not true. And 2:14Andre is saying it's not true and he's 2:16right. And so I think within Andre's 2:18frame of reference saying that really 2:21good agents that have memory, that are 2:22robust, that are super reliable, that 2:24don't need architecture in order to do 2:27complicated tasks, yeah, that does feel 2:29like it's a decade away, right? Like 2:30that's not necessarily right around the 2:32corner. The thing that I think from his 2:34perspective he didn't feel the need to 2:36emphasize is that we can already get 2:38value out of them today. And I think 2:41that's the piece that I wish had come 2:42through a bit more in the podcast. There 2:45are companies saving on the order of 2:47hundreds of millions of dollars a year 2:49using AI agents today. Not next year, 2:52not the year after, not in a decade, 2:54today. Do those agents have struggles 2:56with memory robustness and reliability? 2:59Andre is correct. You have to account 3:00for that in your architecture. A lot of 3:02what I teach people when I teach about 3:04agents is how you architect for the 3:06agents we have today. Doesn't mean they 3:08don't have value. And so the irony is I 3:11can agree with Andre that from his 3:13perspective maybe agents are sloppy but 3:15they can still add a tremendous amount 3:17of value as they are today. The second 3:19thing that he called out the second big 3:21conversational theme is that LLMs have 3:25cognitive deficits and that we have 3:28trouble driving effective pre-training 3:31dynamics with LLMs. This gets kind of 3:33into the weeds for those of you who are 3:35not super technical. I want to make it 3:36as clear as I can. Fundamentally, 3:38pre-training is a really, really, really 3:42tough way to learn. All you get is a 3:45single signal whether something is right 3:47or wrong. And so, if you're training a 3:50model, all you get is is this a yes or a 3:54no. Is this essay that I wrote good or 3:56bad? There is no room for any kind of 4:00nuanced feedback. It's a known issue. 4:02And that is part of why you have to have 4:05so many different kinds of responses 4:08from many different users during 4:10pre-training to get any kind of 4:12approximated learning. Andre is right 4:14when he says that this model is really 4:17hard to work with and you're sucking 4:18supervision bits through a straw. That's 4:20his words. I think he's correct. Like 4:22it's a tough model to work with. I think 4:25the counterpoint to that is that as 4:27tough as it's been, it has delivered 4:30remarkable results. If we stopped LLM 4:33progress today, which there's not a sign 4:35of, we would still have more than a 4:38decade of technological progress in 4:41order to fully bake in everything that 4:44we already have. And so, as much as I 4:46agree that LLMs don't learn like humans, 4:49and it is really tough to give them 4:51supervision through pre-training, and 4:53there are issues we're going to have to 4:54solve, none of that gets in the way from 4:56a builder's perspective at what we can 4:58do right now. The third thing he called 5:00out is that reinforcement learning is 5:03absolutely terrible, but he can't think 5:05of a better option right now. And I 5:07think he's calling for the industry on 5:09the cutting edge to think about this. 5:11This is what I'm trying to get at when I 5:13talk about this idea of credit 5:16assignment, right? Where you have a yes 5:17or a no. It's such a blunt instrument 5:20that it's really, really, really, really 5:22hard to get it to work correctly. The 5:25conversation then moved over to economic 5:27growth. There's been a lot of 5:28assumptions about artificial general 5:30intelligence driving either, you know, 5:32the doomsdayers, say the end of the 5:34world, or a period of unprecedented 5:36economic growth, say the optimists. And 5:38one of the things that Andre called out 5:40is that his base case, his assumption is 5:43that humans have been baking in a 5:45tremendous amount of innovation into our 5:47baseline 2% uh domestic product growth 5:51over the last several decades. and his 5:53current assessment is that AGI will 5:55blend in to the current trend of 5:59automation and will not see a shift in 6:03baseline. So, he's not saying it's the 6:04end of the world. He's not a doomster, 6:06but he's also not saying there's going 6:08to be a step function in growth. That 6:11also got a lot of controversy. But I see 6:14where he's coming from. I think one of 6:15the things that we've struggled with as 6:17a discipline is we still have no answer 6:19for the fact that our lives changed 6:21profoundly in the 90s with the advent of 6:23the internet and the personal computer 6:25and that never ever really showed up in 6:28the gross domestic product growth data. 6:30Similarly, the mobile phone and the 6:32invention of the whole social web never 6:35showed up in GDP data. And I think that 6:37what Andre is challenging us with and 6:39and I think this is useful is he's 6:41challenging us not to expect miracles. 6:44Don't expect doom, but also don't 6:46necessarily expect that everything will 6:48suddenly be solved. This is in stark 6:51counterpoint to some of the more 6:53dramatic predictions that have come from 6:56particularly the team at Anthropic 6:57lately who have been on the record 6:59saying that they expect very dramatic 7:01shifts in employment, in coding, etc. 7:04Andre is just not seeing that. He's 7:06seeing this as part of the ongoing story 7:08of technological innovation and that we 7:10are writing the next chapter with 7:12artificial intelligence over this decade 7:15and that even though it may feel like a 7:16profound shift to us, it may not show up 7:19in those economic statistics as plus 8% 7:23GDP growth. Right? From a builder's 7:26perspective, the takeaway I have is that 7:28we should not expect miracles when we 7:30are trying to plan our systems. I think 7:33we do much better planning our systems 7:35when we just have a gradualist case and 7:36it's super basic and we can just move on 7:40with building the system and not 7:42worrying about whether or not we're 7:43building toward nirvana or doom. It's 7:46much more useful to just try and build 7:48something today out of the systems we 7:50have and we'll get gradually better 7:52systems over time which is something 7:53Andre affirms. One of the things he 7:55talks about a fair bit it's it's sort of 7:56a lengthy diversion is a conversation 7:58about self-driving and you wonder like 8:00why is this coming up? Well, 8:02self-driving is an example of how 8:04difficult it is to teach AI a real world 8:07skill. And I've been thinking about this 8:09for a while. It was fun to hear Andre's 8:11take. Essentially, self-driving has 8:14almost infinite edge cases. That is why 8:17when Whimo comes to a new city, it 8:19cannot just put the cars on the road. It 8:22has to learn the entire city because 8:25every corner is unique. And so what 8:28Andre is saying is that getting to 8:30self-driving is still a rocky road 8:32because we have to learn these lessons 8:34about edge cases, data, data, safety, 8:36how we transfer all of that to agents. 8:39And he wants to make sure that we 8:40understand that even though we've had 8:42flashy demos on self-driving in most 8:44cities around the world, there are zero 8:46self-driving cars, even though you can 8:48go to parts of San Francisco and get 8:50them today. And he's calling out the gap 8:52there is around those same things he 8:54emphasized at the sort of top of the 8:56chat right around memory robustness and 8:58reliability. You cannot generalize a 9:01Whimo driving agent to any city. You 9:04have to custom train it and that is 9:06brittle and that is tough and he's right 9:08to call that out as an issue. At the 9:10same time as a builder, Whimo is not 9:13stopping roll out, right? Whimo has like 9:15a half a dozen or 10 cities they're 9:16trying to roll out to right now. driving 9:19continues to get solved over time and we 9:21are doing something really similar on 9:23the AI side where we're just biting off 9:25pieces of the problem and going after 9:27it. And that's an area where I think 9:29Andre and I agree given the pace of the 9:32kinds of problems we're solving and how 9:33quickly we're solving them. If you want 9:35a world where you have a truly generally 9:38intelligent agent that can do absolutely 9:39anything with robustness and reliability 9:42and it doesn't need architecture to 9:44support train structure and scaffolding 9:47might take 10 years. He might be right 9:49about that. The last thing that he 9:50talked about that I want to call out is 9:52a conversational theme around education. 9:55He talked about the idea that 9:57personalization and AI tutors are super 10:00promising for helping people to learn 10:02what they need to with the caveat that 10:05we have some challenges around memory 10:08that we need to address. And this is 10:10something I've called out for a while on 10:12this podcast that I do. Memory is not an 10:14easy problem to solve. Memory is a tough 10:17problem. Memory with AI doing it well is 10:20not easy. I broke down why uh on a video 10:22not too long ago. If you want to teach 10:25people usefully, one of the things you 10:27need to do is to be extremely good at 10:30incrementing the next lesson in a way 10:32that is useful based on the agents 10:35memory of the students interaction with 10:37the material. It's a complicated task 10:40and you have to make sure that you are 10:43ready to give the agent that 10:45responsibility. And I think one of the 10:47things that I'm really curious to see, I 10:49know there's a number of initiatives 10:50going on right now around education and 10:52AI. I want to get into the weeds. I want 10:54to understand better how education and 10:58AI are solving issues of memory when it 11:00comes to learning from students and how 11:03we're able to do that in a way that is 11:05responsible, respectful, privacy first, 11:07but also learns from the student. It's a 11:10real challenge and I think Andre was 11:11right to call it out, but it's also a 11:13real opportunity and he recognized that 11:15as well. Let's jump to the reactions. 11:17The reactions were almost uniformly 11:20terrible. The headlines picked up the 11:22most sensationalist take like agents are 11:25slop, AGI is a decade away. And they 11:27framed it as popping the bubble of AI in 11:30Silicon Valley or as rebuttal of 11:32near-term artificial general 11:34intelligence optimism. And I think in 11:37many ways they took Andre's words out of 11:39context. In fact, he sort of suggested 11:41that when he wrote his follow-up post on 11:43Axlater. He did not intend to ignite the 11:46kind of firestorm that he ignited. And I 11:49don't think he realized how much his 11:51words are taken seriously, not just by 11:53people inside Silicon Valley, but by the 11:56world at large because of his stature as 11:58a founder of Open AI or a former founder 11:59of Open AI. I agree. I think the 12:01reaction is way overdone. I think that 12:04there is almost no reason for the kind 12:08of hostility that I saw in the press 12:10toward the Silicon Valley AI community 12:14unless there's that sort of underlying 12:16hostility toward AI that this piece 12:18tapped into. And it's kind of ironic, 12:20right? Because Andre is someone who 12:22helped to build the AI we have today. 12:24He's certainly not anti- AI at all. And 12:26yet, I felt some of that hostility 12:28coming back in or getting tapped into in 12:30the reaction I saw from the press. So, 12:32what's a better way to respond to this? 12:34You understand what he's talked about. 12:35I've given you a few hints as to my 12:37take, but let's ladder this back. I want 12:39to give you four undernotic points that 12:42people aren't talking about from Andre's 12:44piece and just dig into it a little bit 12:46and talk a little bit about my 12:48takeaways. Number one, there is 12:50something rich in what Andre was talking 12:52about around continuity over rupture. 12:54So, the idea is think of continuity in 12:57your business planning as a huristic. 13:00Assume steady compounding. Assume steady 13:02compounding of capabilities. Assume 13:05steady compounding of growth. Assume 13:06steady compounding and not magical step 13:08changes. Assume that the boring 13:10reliability work you do today is going 13:13to be relevant. I think that one of the 13:15things we're really missing is a return 13:18to anchoring on a steady sense of the 13:21future because AI has felt so uncertain. 13:24And one of the things I'm grateful for 13:26that I wish people would talk about more 13:28is that Andre sees a steady sense of the 13:30future. Andre is not panicking. Andre is 13:33actually seeing the problem of agents as 13:35a really hard one that's going to take a 13:37long time to think through and solve 13:39properly if we solve it in its totality, 13:42which is the way he framed it. I'm 13:44looking at it as a builder and saying 13:45for this tiny piece of the world where 13:47we already have agents, wow, we have a 13:50lot we have to do architecturally to 13:52build well. But the good news is Andre 13:55is saying you've got runway to do that, 13:57right? You can build well now. 13:58Continuity over rupture. That is a 14:01discipline you can practice. That is not 14:03living in denial. And I do think I've 14:05heard people say if you believe that 14:08things will just be the same, then 14:10you're living in denial. Absolutely. 14:11There's going to be massive changes 14:13associated with AI. But we will see a 14:16continuity in those changes. We can see 14:18those trends. As an example, is it true 14:20that jobs are evolving because of AI? 14:23And is it true that we can trace those 14:26patterns and project out trends? Also 14:28true. There's a kind of continuity in 14:30the trend line that we can understand. 14:32You can see literally on the graph the 14:35line going up for AI job postings. Yes, 14:39that's continuity too because you can 14:41see that a new industry is forming and 14:44the new industry has new jobs. We have 14:46seen that before. We saw that with every 14:49major technological innovation through 14:51history. With steam, with rail, with 14:53silicon and computers, you see new jobs 14:55forming. Same with AI. It's actually not 14:58that different. The other thing that I 14:59think is not well talked about is that 15:02his reinforcement learning critique is 15:04not anti-reinforcement learning. For 15:06those who are deep in the weeds, there 15:08was a lot of reaction that suggested 15:10Andre doesn't believe in reinforcement 15:12learning anymore. No. And he clarified 15:14this, but like the straw metaphor that 15:16he used is a specific indictment of the 15:20kinds of sparse trajectory level signals 15:23that bleed across all tokens in earlier 15:26versions of reinforcement learning. If 15:28you have richer, finer grain supervision 15:30and you have better memory, you can 15:33start to get to higher quality 15:34reinforcement learning. And if that's 15:36over your head, that's fine. But you can 15:38take it away as he's critiquing the he's 15:43critiquing the lack of signal that you 15:45get when you take these blunt yes or no 15:48instruments and just apply it to the 15:50whole model, which is what I talked 15:51about earlier. But he's saying you can 15:54use the same principle of reinforcement 15:56learning with finer grain supervision, 15:58really high quality data and improved 16:00memory and you're going to get much 16:01better results. And so he's essentially 16:03calling for us to get better at 16:04reinforcement learning and think about 16:06how we do it in a richer way, which I 16:08think is a good challenge. Not something 16:10I have to deal with thankfully, but for 16:11those in the modelmaker community, it's 16:13relevant. The third point that I think 16:15is undercovered is that human learning 16:16and LLM training is not just a data 16:19scale problem. So we've talked about in 16:22the past this idea that if you train 16:24LLMs enough maybe they're going to get 16:26to the point where they can match human 16:27learning. What he challenged is this is 16:30not just a data scale problem. Right? 16:33Andre's point is that without durable 16:35memory agents will not approximate human 16:39learning trajectories. Your agent cannot 16:42learn the way you learn if it cannot 16:45remember like you remember. He came back 16:47to the memory problem. Now, the press 16:49associated this with slop and really was 16:52negative. But I think the more 16:53interesting point here is that the 16:55memory problem is something that Andre 16:57sees at the root of a lot of other 17:01issues. And if we can solve for LLM 17:04remembering in a reliable way, we're 17:06going to correspondingly unlock a lot of 17:08extra power. So much of what I do when I 17:11am advising clients and working with 17:13folks on building agent systems is we 17:16think about memory for the agent. What 17:18memory is needed for this agentic task? 17:21Where does it live? How does it update? 17:23Who touches it? What are the permissions 17:24involved? How does it change over time? 17:27We are dealing with memory engineering. 17:29And what Andre is saying is that that is 17:32a hard thing to do now. He's right. And 17:34if we're going to make it easier, we are 17:36going to have to solve some root 17:38problems with LLMs that remain unsolved 17:41today. And I think that's a fair point. 17:43The fourth thing that I think didn't get 17:45talked about enough is the evolution 17:47analogy that he used. At one point he 17:50talks about the idea with Dwaresh that 17:52DNA is a kind of miraculous compression 17:55that we can compress our entire 17:57existence as humans into this tiny 18:00little DNA strand. And yet somehow we we 18:03come out and we start to learn as babies 18:05and we grow and it becomes this 18:06tremendous compression algorithm, right? 18:08where DNA is incredibly able to build 18:13useful learning creatures. And I think 18:16that one of the things he called out 18:17here that didn't get talked about, that 18:19certainly got buried in all of the slop 18:21conversation, is he said, and this is 18:23not something that everyone agrees with 18:25in Silicon Valley, by the way, but he 18:26said very clearly, we should not use 18:29that analogy, that that is a description 18:31of what humans are, maybe a description 18:33of what animals are, but not a useful 18:36description of LLMs even by metaphor. In 18:39other words, it's not just that LLMs 18:41don't have DNA. It's that we should not 18:42try and mimic that pathway because we 18:45are trying to build useful and 18:47controllable tools. We are not trying to 18:49build animals or creatures. There may be 18:51some people who disagree with him there, 18:52but I think that is a really good point 18:55and I think it's worth making again. We 18:58are trying to build useful controllable 19:00tools and the metaphors that we are 19:01using for most of this end up not being 19:04tool metaphors and we could use that 19:06because we are trying to optimize for 19:08the wrong thing if we're saying we're 19:10building people cuz we're not building 19:11people. So is this the decade of agents? 19:13I would say it is and I think that my 19:15answer is optimistic where in the same 19:18wording the press has picked that up as 19:20pessimistic. We have so much in front of 19:22us to build from an agentic perspective. 19:25We are just getting started. One of the 19:27things that I did as soon as I saw this 19:30is I went back and I'm going to include 19:32this in my writeup. I went back and 19:34looked at how I've written about AI 19:36agents in the past. And I want to pick 19:38out some of the principles of AI agents 19:42that stand the test of time that are in 19:44line with what Andre is talking about 19:46here. And I want to give you a a write 19:50up prompt that helps you to wrestle with 19:52the implications of AI agents in your 19:54current software stack against some of 19:57the principles that Andre is talking 19:58about here. So, is my agent assuming 20:01reliability? Is my agent assuming 20:03continuity? Is my agent dealing with 20:05memory appropriately? I think those are 20:07really interesting questions. We don't 20:09talk about them enough and I felt like 20:11this podcast was a doorway for me to 20:13think about them. So, if you want to 20:14think about them, too, you can dig in. 20:16and I did a whole write up on it. Enjoy. 20:18Don't panic as always. And uh we'll wait 20:21until the next Silicon Valley post 20:22catches on fire.