Learning Library

← Back to Library

Context Engineering: Unlocking LLM Agent Potential

14m • YC Root Access • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Dex, founder of Human Layer and a Fall ’24 YC batch, introduced “context engineering” as an early framework for building reliable LLM‑driven agents, predating popular discussions by Toby, Andre, and Walden.
He highlighted two influential talks: Sean Grove’s “The New Code,” which argues that the future value lies in precise specifications rather than hand‑written code, and a Stanford study showing AI‑assisted development often creates rework and slows progress on complex, brownfield projects.
According to Dex’s observations and founder interviews, AI coding agents excel for rapid prototyping but struggle with large, legacy codebases and intricate systems, prompting product teams to hand off prototypes to engineers for production.
Consequently, Dex framed “context engineering” as the discipline for extracting maximum utility from today’s LLMs while waiting for models that can reliably generate production‑grade code.

Sections

Full Transcript

# Context Engineering: Unlocking LLM Agent Potential **Source:** [https://www.youtube.com/watch?v=IS_y40zY-hc](https://www.youtube.com/watch?v=IS_y40zY-hc) **Duration:** 00:14:39 ## Summary - Dex, founder of Human Layer and a Fall ’24 YC batch, introduced “context engineering” as an early framework for building reliable LLM‑driven agents, predating popular discussions by Toby, Andre, and Walden. - He highlighted two influential talks: Sean Grove’s “The New Code,” which argues that the future value lies in precise specifications rather than hand‑written code, and a Stanford study showing AI‑assisted development often creates rework and slows progress on complex, brownfield projects. - According to Dex’s observations and founder interviews, AI coding agents excel for rapid prototyping but struggle with large, legacy codebases and intricate systems, prompting product teams to hand off prototypes to engineers for production. - Consequently, Dex framed “context engineering” as the discipline for extracting maximum utility from today’s LLMs while waiting for models that can reliably generate production‑grade code. ## Sections - [00:00:00](https://www.youtube.com/watch?v=IS_y40zY-hc&t=0s) **Context Engineering: Origins and Vision** - Dex, founder of Human Layer, outlines the early development of context engineering, the 12‑factor agents manifesto, and future directions for building reliable LLM‑based agents. - [00:03:20](https://www.youtube.com/watch?v=IS_y40zY-hc&t=200s) **From Manual Review to AI-Assisted** - The speaker recounts their team's eight‑week shift from line‑by‑line code inspection to spec‑driven workflows using a coding agent, highlighting productivity gains, token‑budget considerations, and smarter strategies for steering the agent when it goes off track. - [00:06:42](https://www.youtube.com/watch?v=IS_y40zY-hc&t=402s) **Optimizing Coding Agents with Sub‑Agents** - The speaker discusses efficient token use, looping prompts, and leveraging sub‑agents for context control to trace information flow across a codebase. - [00:09:53](https://www.youtube.com/watch?v=IS_y40zY-hc&t=593s) **Iterative Planning Over Massive PRs** - The speaker stresses using brief, continuously updated implementation plans and a linear workflow with intentional human review to keep code changes understandable and maintain team mental alignment, rather than grappling with unwieldy, thousands‑line pull requests. - [00:13:32](https://www.youtube.com/watch?v=IS_y40zY-hc&t=812s) **Coding Agents, Team Transformation, and Event Promo** - The speaker predicts coding agents will become commoditized, emphasizes the difficulty of changing team workflows, and promotes a near‑full hyper‑engineering event on advanced context engineering for coding agents. ## Full Transcript

0:00Hi everybody. Uh my name is Dex. I'm the 0:02founder of a company called Human Layer. 0:04Uh I was in the fall 20. Apparently 0:05we're all YC founders on stage today. I 0:07was in the fall 24 batch. Um 0:11I'll give you a little itty bitty 0:13history of context engineering in the 0:15term. Um long before Toby and Andre and 0:18Walden were tweeting about this in uh 0:20June. In April 22nd, uh we wrote a weird 0:23little manifesto called 12actor agents 0:26um principles of reliable LLM 0:28applications. And then on June 4th, um, 0:30and shouts out, I did not know Swix was 0:32going to be here, but he's getting a 0:33shout out anyways. Uh, changed the title 0:35of the talk to context engineering, um, 0:38to give us a shout out for that. Um, so 0:41everyone's been asking me what's next. 0:42We did the context engineering thing. 0:44Um, we talked about how to build good 0:46agents. Um, I will point out my two 0:49favorite talks from AI engineer this 0:50year. Um, incidentally, the only two 0:53talks with more views than 12 factor 0:55agents. Um, number one is, uh, Sean 0:58Grove, the new code. Um, he talked about 1:02how, um, if I can figure out how to use 1:04this. Um, he talked about how we're all 1:06vibe coding wrong and how the idea of 1:08sitting and talking to an agent for two 1:10hours and figuring out and exactly 1:12specifying what you want to do and then 1:15throwing away all the prompts and 1:16committing the code is basically 1:19equivalent to um if you're a Java 1:22developer and you spend six hours 1:24writing a bunch of Java code and then 1:25you compile the jar and then you checked 1:27in the compiled asset and you threw away 1:29the code. in the future where AI is 1:31writing more and more of our code, the 1:32specs, the the description of what we 1:34want from our software is the important 1:36thing. Um, and then we had the Stanford 1:38study which was a super interesting 1:39talk. Um, they ingested data from 1:42100,000 developers of all giant 1:44enterprises down to small startups. Um, 1:46and they found that like AI engineering 1:48and software leads to a lot of rework. 1:49So even if you get benefits, you're 1:51actually throwing half of it away 1:52because it's kind of sloppy sometimes. 1:54Uh, and it just doesn't work for complex 1:56tasks or brownfield tasks. Um, so old 1:59code bases, legacy code bases, things 2:00like that. Um, and especially for 2:03complex brownfield tasks, it can be 2:04counterproductive. Um, not even that it 2:07doesn't really help that much, but it 2:08can actually slow people down. Um, and 2:10this kind of matched my experience, uh, 2:11talking to lots of smart founders. It's 2:13like, uh, yeah, coding agents, it's good 2:15for prototypes. Even Amjad from Replet, 2:17um, was on a podcast six months ago and 2:19he's like, yeah, our product managers 2:21use this to build prototypes and then 2:22when we figure it out, we give it to the 2:24engineers and they build production. um 2:26doesn't work in big repos, doesn't work 2:28for complex systems. Maybe someday when 2:31the models get smarter, we'll be able to 2:32have AI write all of our code. But that 2:34is what context engineering is all 2:36about. How do we get the most out of 2:39today's models? Um so I'm going to tell 2:41you a story about kind of a journey 2:42we've been on the last couple months of 2:45learning to do better context 2:46engineering with AI generated code. Um 2:48so I was working with one of the best AI 2:50coders I've ever met. Um, they were 2:52shipping every couple days I would get a 2:5420,000line PR of Go code and this was 2:57not a CRUD app or a Nex.js API. This was 3:00complex systems code with race 3:02conditions and shutdown order and all 3:04this crazy stuff. Um, and I just 3:06couldn't review it. I was like, I I hope 3:09you know I'm not going to read this next 3:102,000 lines of Go code. Um, and so we 3:13were forced to adopt spec first 3:16development because it was the only way 3:17for everyone to stay on the same page. 3:19And I actually learned to let go. I 3:20still read all the tests, but I no 3:22longer read every line of code because I 3:24read the specs and I know they're right. 3:26And it took a long time and it was very 3:27uncomfortable. But over 8 weeks or so, 3:29we made this transformation and now 3:31we're flying. We love it. So, I'm going 3:32to talk about a couple of things we 3:33learned on this process. Um, I know it 3:36works because I shipped six PRs last 3:37Thursday and I haven't opened a 3:38non-markdown file in an editor in almost 3:41two months. Um, so the goals, I didn't 3:44set these goals. I was forced to adopt 3:45these goals. Uh, but the goals are works 3:48well in big complex code bases, solves 3:50big complex problems, no slop, we're 3:53shipping production code, and everyone 3:55stays on the same page. Oh, and spend as 3:58many tokens as possible. 4:00Um, this is advanced context engineering 4:02for coding agents. Um, I want to talk 4:04about the most naive way to use a coding 4:06agent, which is to shout back and forth 4:08with it until you run out of context or 4:11you give up or you cry. Um, and you say, 4:14"No, do this. No, stop. you're doing it 4:16wrong. Um, you can be a little bit 4:18smarter about this. Um, basically, if 4:20you notice the agent is off track, a lot 4:22of people have done this. I've seen some 4:23people from OpenAI post about this. This 4:25is pretty common. If it's really 4:26screwing up, you just you just stop and 4:28you start over and you say, "Okay, try 4:30again, but make sure not to try that 4:32because that doesn't work." Uh, if 4:34you're wondering when you should 4:35consider starting over with a fresh 4:37context, 4:39if you see this, it's probably time to 4:42start over and try again. 4:46Um, we can be smarter about this though. 4:48Um, and this is what I call intentional 4:49compaction. So, it's not just start over 4:52and I'm going to tell you something 4:53different. Put my same prompt in with a 4:54little bit of steering. But even if 4:56we're on the right track, if we're 4:57starting to run out of context, um, be 4:59very intentional with what you commit to 5:02the file system and the agents memory. I 5:03think slashcompact is trash. I never use 5:05it. Um, we have it write out a progress 5:07file very specifically, which is like my 5:10vibe of what I found works really well 5:11for these things. Uh, and then we use 5:13that to onboard the next agent into 5:15whatever we were working on. Um, 5:19what are we compacting? Why? Like how 5:21did I get to this? Lots of people have 5:23instincts about what works here. Um, so 5:25the question is like what takes up space 5:26in the context window? Looking for 5:28files, understanding the flow, doing 5:30edits, doing work. If you have MCP tools 5:32that return big blobs of JSON, that's 5:34going to flood your context window with 5:35a bunch of nonsense. Um, so what should 5:37we compact? We'll get on to like what 5:39exactly goes in there. Um, but it looks 5:41something like this. Um, and I'll talk 5:43about the structure of this file a 5:44little bit more. Um, why are we obsessed 5:46with context? Because LMS are pure 5:49functions. I think Jake said a lot of 5:50interesting things about this. The only 5:52thing other than like training your own 5:53models and messing with the temperature, 5:54the only thing that improves the quality 5:55of your outputs is the quality of what 5:57you put in, which is your context 5:58window. Um, 6:01and in a coding agent, your agent is 6:03constantly looping over determining 6:04what's the right next tool to call, 6:06what's the right next edit to make, and 6:08the only thing that determines its 6:10ability to do that well is what is in 6:12your context window going in. And we'll 6:14throw this one into everything is 6:15context engineering. Everything that 6:17makes agents good is context 6:19engineering. So, we're going to optimize 6:20for correctness, completeness, size, and 6:24trajectory. I'm not going to talk about 6:25a lot about trajectory because it's very 6:27vibes based right now. Um, but to invert 6:29that, the worst thing to have in your 6:31context window is bad info. Second worst 6:34thing is missing info and then just too 6:36much noise. And if you wanted an 6:38equation, we made this dumb equation. 6:40Um, 6:42Jeff figured this out. Uh, well, Jeff, 6:44lots of people are figuring this out, 6:45but Jeff Huntley works on source AMP. 6:47Um, which I know Bang was supposed to be 6:50speaking tonight. I'm sure I hope he 6:51will appreciate this talk. Uh, in the 6:53spirit of what they've been talking 6:55about, um, you got about 170,000 tokens. 6:58The less of them you use to do the work, 7:00the better results you will get. Um, he 7:02wrote this thing called Ralph Wigum as a 7:03software engineer. Um, and he talks 7:05about, hey, this is the dumbest way to 7:07use coding agents and it works really, 7:08really well, which is just to run a same 7:11prompt in a loop overnight for 12 hours 7:12while he's asleep in Australia and put 7:14it on a live stream. I actually think 7:16that he's being humble. It's a very, 7:18very smart way to use coding agents if 7:19you understand LMS and context windows. 7:22Um, I'll link that article as well. Um, 7:25I'll put up a QR code at the end with 7:26everything. Um, the next step is you can 7:28do inline compaction with sub aents. A 7:30lot of people saw cloud code sub aents 7:32and they jumped in and they said, "Okay, 7:34cool. I'm going to have my product 7:35manager and my data scientist and my 7:36front-end engineer and like maybe that 7:39works." Um, but they're really about 7:41context control. And so a really common 7:43task that people use sub agents for when 7:45they're doing this kind of like high 7:46level coding agents is they will find 7:50you want to find where something 7:51happens. You want to understand how 7:52information flows across multiple 7:54components in a codebase. Um you will 7:56say maybe you'll steer it to use a sub 7:58agent. A lot of models have in their 8:00system prompts to use a sub aent 8:01automatically and you say hey go find 8:03where this happens and then the parent 8:06model will call a tool that says go give 8:08this message to a sub aent. the sub 8:09agent goes and finds where the file is, 8:11returns it to the parent agent. The 8:13parent agent can get right to work 8:15without having to have the context 8:17burden of all of that reading and 8:18searching. 8:20Um, and the ideal sub agent response 8:23looks something like this. And I'm not 8:24going to talk about how we made this or 8:25where it comes from yet. Um, there's a 8:28lot to be said about sub agents. The 8:29challenge of like playing telephone and 8:31like you care about the thing that comes 8:32back from the sub agent. So, how do you 8:33prompt the parent model to prompt the 8:35child model about how it should return 8:37its information? Uh if you've ever seen 8:39this thing, we're doing basically uh 8:42what is it? Uh stochastic system. This 8:44is a deterministic system and it gets 8:46chaotic. Imagine with nondeterministic 8:47systems. Um so what works even better 8:50than sub agents and the thing that we're 8:51doing every day now is what I call 8:53frequent intentional compaction. 8:55Building your entire development 8:57workflow around context management. Um 8:59so our goal all the time is to keep 9:00context utilization under 40%. And uh we 9:04have three phases research, plan and 9:06implement. Um, the research is really 9:08like understand how the system works and 9:10all the files that matter and perhaps 9:11like where a problem is located. This is 9:14our research prompt. It's really long. 9:15It's open source. You can go find it. 9:17This is the output of our research 9:18prompt. It's got file names and line 9:20numbers so that the agent reading this 9:22research knows exactly where to look. It 9:24doesn't have to go search 100 files to 9:26figure out how things work. Um, the 9:28planning step is really just like tell 9:30me every single change you're going to 9:32make. not line by line, but like include 9:34the files and the snippets of what 9:36you're going to change and be very 9:37explicit about how we're going to test 9:39and verify at every step. So, this is 9:41our planning prompt. This is one of our 9:42plans. Um, and then we implement and we 9:44go write the code. And honestly, if the 9:46plan is good, I'm never shouting at 9:48cloud cloud anymore. And if I'm shouting 9:50at cloud, it's because the plan was bad. 9:52And the plan is always much shorter than 9:53the code changes sometimes, most of the 9:55time. Um, and as you're implementing, we 9:57keep the context under 40%. So, we 9:59constantly update the plan. We say, 10:00"This is done. On to the next phase. is 10:02new context window. Um, this is our 10:04implement prompt. These are all open 10:06source. I'll tell you where to find 10:07them. Um, this is not magic. You have to 10:10read this It will not work. And so 10:12we build it around intentional human 10:14review steps because a research file is 10:16a lot easier to read than a 2000 line 10:19PR. But you can stop problems early. 10:22This is our linear workflow for how we 10:24move this stuff through the process. 10:27Um, and I want to stop. Does anyone know 10:29what code review is for? 10:32anybody? Yeah, me neither. 10:35Um, code review is about a lot of 10:38things, but the most important part is 10:39mental alignment. Keeping the people on 10:41the team aware of how the system is 10:43changing and why as it evolves over 10:45time. Um, I can't read 2,000 lines of 10:48golining every day, but I can sure as 10:50heck read 200 lines of an implementation 10:52plan. Um, and if the plans are good, 10:54that's enough because we can catch 10:55problems early and we can maintain 10:57shared understanding of what's happening 10:58in our code. Um, so putting this into 11:01practice, uh, I do a podcast with 11:02another YC founder named Vibbov. He 11:04built Bam. I don't know, has anyone here 11:06you used BAML before? All right, we got 11:08a couple BAML guys. Um, I decided, I 11:11didn't tell Vibb I was doing this. We 11:12decided to see if we could oneshot a fix 11:14to a 300,000 line RS codebase. Um, and 11:17the episode is 75 minutes and we go 11:19through the whole process of all the 11:20things that we tried and what worked and 11:21what didn't work and what we learned. 11:23Um, I'm not going to go into it. I'll 11:24give you a link. But we did get it 11:26merged. The PR was so good the CTO did 11:28not know I was doing it as a bit and he 11:29had merged it by the time we were 11:31recording the episode. Um, so confirmed 11:34works in brownfield code bases and no 11:36slop. It got merged. Um, and I wanted to 11:39see if it could solve a complex problem. 11:41So I sat down with uh the boundary CEO 11:44and for 7 hours we sat down and we 11:46shipped 35,000 lines of code. Um, a 11:49little bit of it was generated but we 11:50wrote a lot of code that day and uh he 11:52estimated that was 1 to two weeks of 11:54work roughly. Um, so it can solve 11:56complex problems. You can add WASM 11:57support to a programming language. Um, 12:00and so the biggest insight from here 12:02that I would ask you to take away is 12:04that a bad line of code is a bad line of 12:06code. And a bad part of a plan can be 12:09hundreds of bad lines of code. And a bad 12:12line of research, a misunderstanding of 12:14how the system works and how data flows 12:16and where things happen can be thousands 12:18of bad lines of code. And so you have 12:20this hierarchy of where do you spend 12:22your time? And yes, the code is 12:25important and it has to be correct. But 12:26you can get a lot more for your time by 12:29focusing on specifying the right problem 12:31and what you want and by understanding 12:33making sure that when you launch the 12:34coding agent, it knows how the system 12:36works. And of course, our cloud MD and 12:38our slash commands are like we basically 12:40like test those for weeks before 12:42anyone's allowed to change them. Um, so 12:44we review the re research and plans and 12:46we have mental alignment. Um, I don't 12:49have time to talk about this one because 12:50I think I'm already over. Uh, but how 12:52did we do? Um, we we did the goals. I 12:56didn't I didn't ask for these goals, but 12:58they were thrust upon me and we solved 12:59them. Uh, we spent a whole lot of 13:01tokens. This is a team of three in a 13:04month. Um, these are credits, by the 13:06way. Um, but I don't think we're going I 13:08I don't think we're switching to the max 13:10plan because this is working well enough 13:11that I'm it's I mean, it's worth it's 13:12worth spending because it saves us a lot 13:14of time as engineers. Um, our intern Sam 13:17is here somewhere. He shipped two PRs on 13:19his first day. on his eighth day, he 13:21shipped like 10 in a day. This 13:22works. Um, we did the BAML thing. And 13:25again, I I don't I don't look at code 13:26anymore. I just read specs. 13:29So, what's next? I kind of maybe think 13:32coding agents are going to get a little 13:33bit commoditized, but the team and the 13:35workflow transformation will be the hard 13:37part. Getting your team to embrace new 13:39ways of communicating and structuring 13:41how you work is going to be really, 13:42really hard and uncomfortable for a lot 13:44of teams. Um, people are figuring this 13:48out. you should try to figure this out 13:49because otherwise you're gonna have a 13:50bad time. Um, we're trying to help 13:53people figure this out. We're working 13:55with everybody from six person YC 13:57startups to a thousand people uh public 13:59companies. Um, there is a Oh, we're 14:02doing an event tomorrow on 14:03hyperengineering. Uh, it is very very 14:06close to capacity, but if you come find 14:07me after this and give me a good pitch, 14:09there are a couple spots left. Um, and 14:12there's a link to the video where we 14:13talk about this for 90 minutes and uh me 14:15and Vibb bust each other's balls for a 14:17while. That is advanced context 14:19engineering for coding agents. Thank 14:21you. 14:31[Music]