Learning Library

← Back to Library

Decade of AI Agents: Coding Assistants

Key Points

  • While some hype frames 2024 as “the year of AI agents,” experts like Andrej Karpathy argue it’s actually the **decade of AI agents**, noting today’s agents are still limited and over‑promised.
  • Current agents stumble because they lack sufficient model intelligence, robust computer‑UI interaction skills, continual learning, and multimodal capabilities.
  • **Use case 1 – coding assistants** is a strong fit: programming’s highly structured, rule‑based nature lets agents rely on pattern matching and clear pass/fail tests, and IDE interfaces are stable and simple to navigate.
  • **Use case 2**—a recurring but underperforming scenario that appears with every new agent release—highlights the gap between hype and practical ability in today’s agents.
  • **Use case 3** envisions ambitious, future applications that exceed present capabilities but could become commonplace within the next ten years as agent technology matures.

Full Transcript

# Decade of AI Agents: Coding Assistants **Source:** [https://www.youtube.com/watch?v=ZeZozy3lsJg](https://www.youtube.com/watch?v=ZeZozy3lsJg) **Duration:** 00:13:13 ## Summary - While some hype frames 2024 as “the year of AI agents,” experts like Andrej Karpathy argue it’s actually the **decade of AI agents**, noting today’s agents are still limited and over‑promised. - Current agents stumble because they lack sufficient model intelligence, robust computer‑UI interaction skills, continual learning, and multimodal capabilities. - **Use case 1 – coding assistants** is a strong fit: programming’s highly structured, rule‑based nature lets agents rely on pattern matching and clear pass/fail tests, and IDE interfaces are stable and simple to navigate. - **Use case 2**—a recurring but underperforming scenario that appears with every new agent release—highlights the gap between hype and practical ability in today’s agents. - **Use case 3** envisions ambitious, future applications that exceed present capabilities but could become commonplace within the next ten years as agent technology matures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=0s) **Debating the Decade of AI Agents** - The speaker contrasts current hype that AI agents are “the year” with Andrej Karpathy’s stance that their true potential will emerge over a decade, outlining three categories of use cases—from already useful coding assistants to presently limited but common scenarios, and finally to aspirational applications beyond today’s capabilities. - [00:04:23](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=263s) **AI Agent Use Cases: Coding & Travel** - The speaker outlines how current AI agents excel at coding assistance and simple travel‑booking tasks, yet encounter limitations when dealing with complex, non‑happy‑path scenarios. - [00:08:50](https://www.youtube.com/watch?v=ZeZozy3lsJg&t=530s) **Autonomous AI IT Support Dilemma** - The speaker outlines an ambitious AI agent that fully manages and repairs user computers, highlighting technical variability and trust concerns about granting such autonomy. ## Full Transcript
0:00You might have heard that this is the year of AI agents, but some 0:06prominent voices in the AI community, such as OpenAI Co-Founder Andrej Karpathy, they paint a bit 0:13of a different picture. They're actually saying that this is the decade of AI 0:20agents, and that today's AI agents, they kind of struggle with some basic tasks. They are being 0:27a bit oversold, and it will take advancements over the next ten years to work through all the issues. 0:33Now, why do today's AI agents struggle with many tasks? Well, there's a lot of reasons. One of them 0:39is they just don't have enough intelligence, the model behind them. They also struggle with 0:46computer use, interacting with a computer UI, and they lack continual 0:53learning, and they lack some multi-modal capabilities. So 0:59whether this is the the year or the decade of the AI agent. Let's examine 1:06three use cases for agentic AI. So the first one we're going to look at number one here. That one 1:13is going to be where AI agents are already providing tremendous day-to-day utility. They work 1:18really well. Number two, use case number two. That's considered a common use case. It comes 1:25up pretty much every time a new agentic AI model is released. But as you'll see, it kind of falls a 1:31little bit short in practice today. And then finally, number three is going to be an 1:37aspirational use case that's a little bit beyond current capabilities, but maybe far more 1:43commonplace a decade from now. And use case number one is coding assistants, AI 1:50agents that work together alongside developers. And they can do a bunch of stuff, like they can 1:56write code, they can fix bugs, they can generate documentation, they can 2:03review pull requests, those sort of activities. And of course, this isn't hypothetical. If 2:09you're writing code today, chances are you're already taking advantage of agentic coding 2:16assistance to help you. So the question is: why are coding assistance such a good fit 2:23for the agentic capabilities of AI models today? Well, let's go back to those four AI agent 2:29capabilities I've mentioned: intelligence, computer use, multi-modal and continual learning. So let's 2:35first of all start with the capability of intelligence. Now code code 2:42has a number of things going for it. It has a really good structure. Code is very structural. 2:49It has a lot of well-defined rules. So the agent doesn't really need human-level 2:56reasoning for most coding tasks. It needs pattern matching, pattern matching across millions 3:02of code examples. And current models are really good at this sort of thing, and it doesn't hurt 3:07that programing problems, they tend to have clear right and wrong answers. The code either compiles 3:12and it passes tests or well, it doesn't. So that's that one. What about computer 3:20use? Well, it's barely needed because these agents, they work within 3:26integrated development environments (IDEs). And those are well-defined interfaces that haven't 3:32really changed dramatically in years. And agents don't have to navigate inconsistent web UIs or 3:37click through enterprise applications. All right. What about multi-modal capabilities? Well, 3:44not really required. That's because when it comes to code, it's basically text in 3:51and then text out from the model. So code comments, error messages, it's all text-based, 3:58and it's highly structured. And then as for continual learning, well, yes, 4:05programing languages and frameworks evolve. They change, but somewhat slowly and usually with 4:11pretty extensive documentation. So an agent using a large language model, that would already have 4:17been pre-trained on a lot of that information, it was part of its training set. 4:23So it has knowledge already of vast amounts of source code and knowledge that applies broadly 4:29across most projects. So it's a coding assistance, they play to the strengths of current AI models. 4:34They operate in structured environments, they have immediate feedback loops, and they work with 4:40well-defined problems. Okay. Now use case two is travel booking, and this is the one that comes up 4:45in practically every demo of new agentic AI models. And the basic premise is a series of AI agents 4:52that handle like booking your entire trip. So that might involve booking some flights 4:59across airlines, and then maybe comparing some hotel options and then 5:05booking everything, making sure that we get the optimal prices, and then ultimately kind of 5:11managing your calendar. And yeah, this does seem like a perfect fit. It's a defined task 5:18with clear goals: get a person from point A to point B at a reasonable cost. So 5:25why does this only somewhat work today? Well, if you have what we can 5:32call kind of simple happy path scenarios, well it does kind of 5:39work quite well. So if you need to book a direct flight and find a standard hotel room, current 5:44agent agents, they can handle that decision-making. The information they're working with, it's mostly 5:49text-based, it's flight times, it's prices, it's hotel descriptions, and that's within their 5:54capabilities. But it doesn't take long to run into limitations. So let's go back to those 6:00four capabilities. And we're going to start again with intelligence. And the big thing with 6:07intelligence is that when it comes to that edge cases, they kind of kill 6:14it. What happens when a flight gets delayed? Or, if you're connecting through a city with certain 6:18visa requirements or you're traveling with an infant? Well, current agents, they really don't 6:24handle the the long tail of real-world complications that human travel agents do. 6:31And if you've ever traveled anywhere, then you don't need me to tell you that travel is full of 6:36edge cases. Okay, computer use is another big one. 6:43Every airline, every hotel chain, every booking site, they all have different 6:49UIs. There's a lot of UI variants. They also might have CAPTCHAs, and they might have 6:56authentication flows. And in fact, many of them are intentionally made difficult to automate. 7:03So when agents need to navigate the actual websites instead of using APIs, that is where they 7:10can struggle a bit. Okay, what about multimodal? Well, reading flight 7:17times and prices from text is fine, but there are some nuances like, take for example, if we have a 7:24hotel map that we actually need to read to see if that hotel is actually walkable to your 7:30conference center, or if it's just kind of technically nearby. Trust me, that's one I've 7:35struggled with a few times. Well, that does require multimodal understanding that current agents, they 7:41might struggle a little bit with. And then, what about the continual learning aspect? Well, 7:48when it comes to continual learning, your preferences that you really matter. Now, sure, you 7:54could fill out a profile. You could say you prefer aisle seats and Marriott hotels, but the real 8:01challenge here isn't just filling out the profile; it's actually learning. So we need to learn by 8:08observing the world and then getting feedback on those observations. And it is 8:14really a loop. The agent needs to figure out that, let's say, you're willing to pay more for direct 8:20flights on Monday mornings, but you like to take connections on Friday afternoons. These are 8:25patterns it needs to learn from your behavior over time, rather than just the simple preferences 8:31that you can think to list up-front. So travel booking, it works well enough to be 8:38impressive in agentic demos with cherry-picked scenarios, but today it's probably not reliable 8:45enough that you would fully trust it with your actual travel, at least without close supervision. 8:51All right, so for use case number three. This is my aspirational one. It's automated IT support. So 8:57this is a bit more than an agent that helps answer helpdesk tickets with cam responses. That's 9:03that's kind of level one stuff. And well, I would say that does already work today. But I'm thinking 9:08more about an agent that does a bit more than that. So it actually completely autonomously logs 9:15into a user's this machine, then it diagnoses whatever the problem is, and then 9:22it actually has full control to go and fix that problem autonomously. Now this seems 9:28like the perfect use case for AI agents. It's repetitive. It often follows patterns. But would 9:35you trust an autonomous AI agent with free rein on your laptop to install fixes and delete 9:41applications without your consent? Yeah, probably not. So why not? Well, let's 9:47take a look at the capabilities again. So what about intelligence in this case? 9:54Well, every user's setup is kind of unique to them. So there are a lot of 10:01different paths we could take here. I think we could say most machines have a bit of a quirky, 10:07unique setup. Now, a simple outlook issue on one machine that might be a corrupted file. On another, 10:14it might be a proxy setting, and then, it could be an expired certificate on a third. And current 10:19agents, they often can't handle these kind of endless edge cases. Now, there are also significant 10:26issues when it comes to computer use. There is a lot of requirements here. So the agent, it 10:32needs to be able to navigate a lot of stuff. So what does it have to navigate? It has to navigate 10:38that user's machine. And well, people have different machines. So just for one example, 10:45if you're a Windows user, it will need to be able to navigate Windows settings. Or if you're on a 10:50Mac, it would need to understand Mac preferences. And that's just the operating system. There's also 10:57the application UIs, which are all specific to those applications and so forth. And all of 11:04this, remember, is potentially running on a system that is in 11:11production. Today's computer use capabilities, they just aren't reliable enough for that level of 11:17trust. Okay. What about multimodal capabilities? Well, users, they're going 11:24to send things like screenshots in. They, they might speak to the agents. So you're going to 11:31have some kind of verbal stuff as well. And that verbal description might not be that instructive. 11:37It might be saying things like, uh, it's doing that thing again. And you've got to kind of figure out 11:41what it is. The agent needs to piece together whatever users can capture in the moment. And then, 11:48when it comes down to continual learning, the agent needs to learn specifically 11:54from outcomes, and those outcomes will adjust over time. So 12:01when software updates break things, which fixes are actually going to work in your specific 12:07environment? As new devices get added and new issues emerge, the agent needs to adapt based on 12:12what's working and what's not in practice. It needs to learn from the feedback loop of 12:18thousands of support interactions beyond a model's initial training data. So basically, this 12:24use case is still emerging, but it's just not fully there yet. So year or 12:31decade? Well actually both, we're in the year of AI agents for, for narrow, 12:38well-defined tasks in structured environments. But we're in the decade of AI agents 12:43for the broader vision, agents that handle messy real-world problems with reliable computer use, 12:49with intelligence about edge cases, with true multimodal understanding and learning that 12:54adapts to your specific environments. So for now, you've got a coding task. Well, an agentic 13:01assistant might be just what you need. But if an AI agent offers to fix your laptop autonomously 13:07with today's models, well, maybe at least ask it to show its work first.