Learning Library

← Back to Library

Overcoming the AI Memory Wall

27m • Unknown Channel • ai-ml • deep-dive • advanced • Watch on YouTube ↗

Key Points

The “memory wall” describes how advances in AI compute outpace improvements in hardware memory, widening the gap between intelligence and memory capabilities.
Large‑language models are intentionally stateless, possessing only parametric knowledge and no episodic memory, so every interaction must rebuild context from scratch.
This stateless design creates “sticky” memory problems that vendors struggle to solve, because statefulness varies by user: what to remember, how to curate it, and how long it should persist.
The speaker outlines five root causes of these memory issues and proposes eight design principles that developers and users can apply to create more effective memory systems.
Rather than waiting for external solutions, practitioners are encouraged to build their own stateful memory layers that align with their specific needs and use cases.

Sections

Full Transcript

# Overcoming the AI Memory Wall **Source:** [https://www.youtube.com/watch?v=JdJE6_OU3YA](https://www.youtube.com/watch?v=JdJE6_OU3YA) **Duration:** 00:27:08 ## Summary - The “memory wall” describes how advances in AI compute outpace improvements in hardware memory, widening the gap between intelligence and memory capabilities. - Large‑language models are intentionally stateless, possessing only parametric knowledge and no episodic memory, so every interaction must rebuild context from scratch. - This stateless design creates “sticky” memory problems that vendors struggle to solve, because statefulness varies by user: what to remember, how to curate it, and how long it should persist. - The speaker outlines five root causes of these memory issues and proposes eight design principles that developers and users can apply to create more effective memory systems. - Rather than waiting for external solutions, practitioners are encouraged to build their own stateful memory layers that align with their specific needs and use cases. ## Sections - [00:00:00](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=0s) **The AI Memory Wall Explained** - The speaker outlines why memory has become a growing bottleneck in AI—identifying five root causes behind the “memory wall” and offering eight actionable principles for users and builders to address the issue. - [00:03:45](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=225s) **Challenges of Dynamic Memory Relevance** - The passage explains why LLMs struggle with context management, highlighting the unsolved relevance problem that varies with task phase, scope, and privacy concerns, and noting that semantic similarity serves only as an imperfect proxy for true relevance. - [00:06:54](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=414s) **Memory as a Lossy Database** - The speaker explains that human memory works like a highly compressed database, retaining “keys” for salient events (like childhood memories) while discarding less important recent details, so intentional recall acts as retrieving stored keys. - [00:10:17](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=617s) **Lock‑in and Memory Portability** - The speaker contends that AI providers intentionally make user‑specific memory non‑portable to create switching costs, turning it into a commons problem, and urges that real multimodel compatibility should be a first‑class objective for both consumers and businesses. - [00:13:24](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=804s) **Differentiating Memory Types in AI** - The speaker explains that knowledge, episodic, and procedural memories each require distinct storage, retrieval, and update architectures, criticizing the AI community’s one‑size‑fits‑all infrastructure approach and advocating principled, architecture‑focused solutions. - [00:17:25](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=1045s) **Match Storage to Query Patterns** - The speaker explains that diverse data types (key‑value, structured, semantic, event logs) require distinct storage solutions aligned with their query patterns, warning against a single “data lake” approach and stressing that context matters more than sheer volume. - [00:21:15](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=1275s) **Compression, Judgment, and Retrieval** - The speaker explains that effective AI use requires human‑curated data compression before prompting (principle 6) and that results from semantic retrieval must be verified for accuracy (principle 7). - [00:25:00](https://www.youtube.com/watch?v=JdJE6_OU3YA&t=1500s) **Memory Principles for Agentic AI** - The speaker outlines eight tool‑agnostic guidelines that ensure reliable, long‑term context persistence in AI systems, urging developers and users to prioritize memory now to gain a competitive edge as AI capabilities grow. ## Full Transcript

0:00Memory is perhaps the biggest unsolved 0:03problem in AI and it is one of the only 0:05problems in AI that is getting worse, 0:07not better. As we get better and better 0:10and better at intelligence, we get worse 0:13at memory, relatively speaking. In fact, 0:16there's a name for it in the model maker 0:18community. It's called the memory wall. 0:20We are not improving the hardware chip 0:22capabilities of our memory systems 0:25nearly as fast as we are improving the 0:28ability of those chips to infer or 0:30compute words or do LLM inference. That 0:33generates a growing gap between our 0:36intelligence capabilities and our memory 0:38capabilities. Don't worry, we won't stay 0:40at the hardware level for long. I want 0:42to go through with you the core issues 0:44that we see as builders, as users of AI, 0:47as designers of AI systems. What is the 0:51root of the memory problems we 0:52experience? If we're at a systems design 0:55level, if we're at a usage level, if if 0:57we are even using Chad JPT, why are 1:00memory problems so sticky and hard to 1:03untangle? Why have we not seen better 1:07solutions in the market? I think there 1:09are good reasons for that. And then once 1:11we go through those root causes, how can 1:14we start to think about solving them? 1:15How can we think about solving them as 1:17users? How can we think about solving 1:19them as builders? So, I'm going to go 1:21through five root causes and then we're 1:23going to flip the script and I'm going 1:24to go through eight principles for 1:26building a solution because I want you 1:28to walk away from this and I want you to 1:30feel empowered to actually design better 1:33memory systems. I don't want you to wait 1:35around for someone in Silicon Valley to 1:36make a pitch and get funded for this. 1:38You can design your own solution here. 1:40So the key thing to keep in mind through 1:43this whole conversation is that AI 1:45systems are stateless by design but 1:49useful intelligence requires state. So 1:51every conversation is stateless meaning 1:53it starts from zero. The model has 1:56parametric knowledge which the weights 1:58we talk about in a model right but it 2:00doesn't have episodic memory. It does 2:02not remember what happened to you. And 2:04I'm sorry, but the 10 or 11 sentences or 2:07the the very lossy memory that chat GPT 2:11has right now or the ability to search 2:13conversations that Claude has right now 2:15is not good enough for that. You have to 2:18reconstruct your context every single 2:21time. This is not a bug actually. It is 2:23an intentional architecture. It is a 2:25design for statelessness because the 2:28model makers want the model to be 2:31maximally useful at solving the next 2:34problem, the problem in front of you. 2:36And they cannot presume that state 2:39matters. It doesn't always matter. So 2:41the promise of memory features is that 2:43vendors are going to be able to 2:44magically solve this by making the 2:46system stateful in ways that are useful 2:48to you. But this creates a whole host of 2:50new problems because statefulness is not 2:53the same for all of us. What should it 2:55remember? Is it passive accumulation? Is 2:57it active curation? How long should it 3:00remember? Is it persistent forever? Is 3:03it stale ever? Does it drop off after 30 3:07days? When do you retrieve it? Do you 3:09retrieve it when it's relevant, sort of 3:12like claude does? Do you retrieve it all 3:14the time and potentially it's noisy in 3:15the context window? How do you update 3:17it? This is one of the biggest problems 3:19with LLMs. People tell me they'll put 3:21their wiki into a retrieval augmented 3:23generation system and I'm like, when was 3:25the last time you updated your wiki? If 3:27it's not updated, how do you overwrite 3:29it? How do you append data to it? How do 3:31you change data? These are not 3:34implementation details. They are 3:36fundamental questions about what memory 3:38is and its purpose when we do work. 3:42Memory matters because we humans are 3:45able to quickly and fluidly negotiate 3:48between stateless brainstorming things 3:51that are like wild and we don't need to 3:53use a lot of our past memory and very 3:55stateful work. LLMs are not good at 3:58that. Loading that context is very very 4:00hard right now. So why is this so 4:02persistent? We've talked a little bit 4:04about how the promise is hard to 4:06fulfill, but what are some of the root 4:08causes that make it hard for vendors to 4:10do this? Number one, the relevance 4:13problem is one of the gnarliest unsolved 4:16problems out there. What's relevant 4:19actually changes based on the task that 4:21you're doing. Are you planning? Are you 4:23executing? The phase of your work. Are 4:25you just exploring? Are you refining 4:27your work? The scope you're in, right? 4:29Is it a personal or is it a project? I 4:31know someone who is in the healthcare 4:33industry. And they have to be very 4:35careful because if they were to ever ask 4:37for health advice then the memory 4:40retrieval within Chad GBT would pull up 4:42work stuff and they are afraid in the 4:44same context if they pull up a work 4:45thing that their personal health data 4:47will leak in because it will all look 4:49like health data. So the scope matters. 4:51What has changed since the last time you 4:54talked? The state delta is what we would 4:56call that. If you come back and you say 4:58this is a new version, does it really 5:00understand that's a new version or not? 5:02Semantic similarity which is what a 5:04retrieval augmented generation depends 5:05on is just a proxy. It is a proxy for 5:10relevance. It is not a true solution. 5:13Finding similar documents works until 5:16you need to find the document where we 5:19decided X and that's very specific. Or 5:21ignore everything about client A right 5:23now but pay attention to clients B, C, 5:25and D. Or please only pay attention to 5:28what we've decided since October 12th. 5:30These are all things that we humans can 5:32understand and execute on when we go and 5:35manually retrieve information. But the 5:37AI using semantic search, it's just not 5:40the right tool for that job. There's no 5:41general algorithm for relevance. There's 5:44no magic relevance solve that the AI can 5:47depend on. You need to use human 5:49judgment about task context. And that 5:51means requiring very complicated 5:53architectures to accomplish a specific 5:55memory task, not just better embeddings 5:59in a rag memory system. And that, by the 6:01way, is one of the big reasons why these 6:03like one-stop shop vendors often 6:06struggle with real implementations. 6:08Number two, the persistence precision 6:11trade-off is a massive issue with memory 6:13systems. If you store everything, 6:15retrieval becomes very noisy and it 6:17becomes very expensive. You jam up your 6:19context window. If you store 6:21selectively, you're going to lose 6:23information that you need later. If you 6:26let the system decide what to keep, it 6:28optimizes often for something that you 6:30didn't ask it to. Maybe it optimizes for 6:32recency. Maybe it optimizes for 6:33frequency. Maybe it optimize for 6:35statistical saliency versus actual 6:38importance. And if you wonder what 6:39statistical salency is, have you ever 6:42tried having an argument with Chad GPT 6:44or Claude or Gemini about the fact that 6:46it's emphasizing the wrong thing in 6:48something it's writing? That is salency. 6:51That's a salency defect. Human memory is 6:54actually, funnily enough, very good at 6:56this through the technology of 6:58forgetting. We use incredibly lossy 7:00compression with emotional and 7:02importance waiting. And so we've 7:04actually done studies on human memory. 7:06And it turns out that you can with 7:09practice get better and better and 7:12better at recalling specific things. But 7:14if you choose not to recall something 7:17that happened to you, you're just going 7:19to lose it. And what's interesting is it 7:21seems to be a database keys issue for 7:23us. I realize I am like some someone in 7:25the comments is going to be a 7:26neuroscientist and just rightly take me 7:28to town. But my understanding of the 7:29reading is that you have to be able to 7:33remember the equivalent of a database 7:35key to retrieve the memory. And if you 7:37can do that, the memory becomes 7:39accessible again. But your short-term 7:41memory, so to speak, in humans is very 7:43lossy. And so you lose the database keys 7:45if you can't persist them with intent. 7:48if you don't intend to remember them. 7:50And that is why fundamentally your 7:52childhood memories can be very 7:53accessible. But what happened last 7:55Thursday? You're sitting there and 7:57you're like, did we eat out or did we 7:58not eat out? Which which day did we go 8:00to the movies? Right? It's not because 8:02you have a profound issue with memory. 8:05It's because your brain is desperately 8:07compressing information to make it 8:09useful to you and has dumped out those 8:10database keys. And when you go to the 8:12effort of remembering, you're literally 8:14retrieving the database keys to get the 8:15memory back. Forgetting is a useful 8:17technology for us. That's the point of 8:19that. AI systems don't have any of that. 8:21They either accumulate or they purge, 8:24but they do not decay. And what I'm 8:26talking about when I'm like, did I go to 8:28the movie? Oh, yeah. It was the movie. 8:29Who was that character? Oh, now I have 8:31I'm recovering the key and I'm able to 8:34get it back. The memory has decayed into 8:37a lossy approximation in the memory key, 8:40but I can recover it if I put effort 8:42into it. We have nothing like that in 8:45AI. That is a uniquely human technology 8:48and it's funny but we have to think 8:49about forgetting as a technology when we 8:51talk about memory. Number three, the 8:53single context window assumption. 8:55Vendors often try to solve memory by 8:57making context windows bigger. But 9:01volume is not the issue. The structure 9:03is the problem. A million token context 9:05window is not a usable million token 9:08context window if it's full of unsorted 9:10context. That is worse than a tightly 9:13curated 10,000 time. The model has to 9:17still find what matters, parse the 9:19relevance, ignore the noise. You have 9:21not solved the problem by expanding the 9:24context window. You have simply made 9:26your problem more expensive. Sometimes 9:29substantially more expensive. I know 9:31people who make calls and they don't 9:33budget the calls and they're like, "Why 9:35is my API bill high?" I'm like, your API 9:38bill is high cuz you're stuffing the 9:39context window and you're just kind of 9:41trying to throw queries against it. It 9:42does not work well and it also is very 9:44expensive. The real solution requires 9:47multiple context streams with different 9:49life cycles and retrieval patterns. It 9:51is hard. You have to design it. It 9:54breaks the mental model of just talk to 9:55the AI. That is why there is no 9:57one-sizefits-all solution. Issue number 9:59four is the portability problem. Every 10:02single vendor builds proprietary memory 10:04layers because they think in their pitch 10:06deck that memory is a moat. I get it. It 10:09makes sense on a pitch deck. Chat GPT 10:11memory cla recall cursor memory banks. 10:14These are not inherently interoperable. 10:17Users will invest time building up 10:19memory in a given system. And the model 10:22makers like that because it makes the 10:23switching cost real and you can't port 10:26what chat GPT knows about me to claude 10:28and your memory is locked in and so on. 10:29The problem here is a problem of the 10:31commons. This behavior set from vendors 10:34and model makers and tool builders 10:36encourages users to leave memory to the 10:40tool rather than encouraging them to 10:42build a proper context library. And I 10:44get it from a product design perspective 10:46because like how many users are going to 10:48really build a product context library? 10:50But if we reframe it and we say 10:53portability is a first class problem, 10:56users should be inherently able to be 10:59multimodel. I think that's more 11:01relevant. And maybe from a consumer 11:02standpoint, you don't care because you 11:05have 800 million users in chat GPT. It 11:07dwarfs everything else, etc. One, that's 11:09not entirely true because Gemini has I 11:11think uh closing in on half a million or 11:13half a billion now. But the other reason 11:15is that from a business perspective, you 11:17have to be multimodel. It is it is a 11:19liability to be single model. And so if 11:22you're building business memory systems, 11:24you must solve the portability problem. 11:26And the issue is any given vendor is not 11:29incentivized to make that truly portable 11:31either. They want to make that 11:33proprietary to them. And then you have 11:35the same bottleneck, but now you're on a 11:37vendor who may not be as well funded as 11:39the model maker. And so it becomes a 11:40house of cards. Number five, the passive 11:43accumulation fallacy. Most memory 11:45features assume you just use your AI 11:47normally and it will figure out what to 11:49remember. That is the default mental 11:51model of users. And so that's the 11:52assumption that memory features build 11:54around. But this fails because the 11:56system cannot distinguish a preference 11:58from a fact. It cannot easily tell 12:01project specific from evergreen context. 12:03I've often seen that mixed up. It 12:05doesn't automatically know when old 12:06information is stale. If you've ever 12:08wondered why chat GPT or Claude or 12:11Perplexity comes back and talks about 12:13old AI models as if they are active 12:15today, that is the same issue. They 12:17can't tell when old information is stale 12:20and it optimizes for continuity. It does 12:22not optimize for correctness. This is 12:24the keep the conversation going issue. 12:25Useful memory fundamentally requires 12:29active curation. You have to decide what 12:31to keep, what to update, and what to 12:33discard. And that is work. And so 12:35vendors promise passive solutions 12:37because active curation they are told 12:40does not scale as a product. I think we 12:43have to start by framing that problem 12:44better because it turns out passive 12:47accumulation doesn't solve for it 12:48either. And this is still a big enough 12:51problem that it costs us billions of 12:53dollars at the enterprise level and it's 12:55extremely frustrating for users both 12:57personally and professionally. The 12:58answer cannot be there is no answer or 13:01we'll fake the answer. Finally, number 13:03six on the root cause side, then we're 13:05going to get to solve. It'll it'll feel 13:06better. Memory is actually multiple 13:09problems. And that's part of why it's so 13:11hard. I hope you're getting that idea, 13:12right? When people say AI memory, what 13:14they really mean is any number of 13:17preferences, how I like things done. 13:19That could be a key value that's 13:20persistent. They could mean facts. 13:22What's true about particular things or 13:24entities that can be structured, it 13:26might need updates. They might mean 13:27knowledge, right? Domain expertise. And 13:29that can be parametric, right? that can 13:31be embedded in weights but it might not 13:33be right and then what do you do? It can 13:35be episodic. So it could be 13:37conversational, temporal, ephemeral 13:39knowledge. It can also be procedural. 13:41Have we solved this before? Right? If 13:43episodic memory is what we've discussed 13:45in the past, procedural memory is how we 13:47solve this problem in the past. And 13:48those are also different things. And so 13:50you have exemplars there, you have 13:52successes and fails in procedural 13:54memory. Every single memory type needs 13:57different system design to handle 13:59storage retrieval and update patterns. 14:01And if you feel like you're getting a 14:02headache here, you're not alone. This is 14:04why we don't have a good solve. And this 14:06is why I want to lay out in the next 14:07section principles for solve. But it 14:09starts with being honest about the 14:11problem. Treating this problem as one 14:12problem guarantees you are going to 14:15solve none of the real problems well. 14:17And that is why we have memory as a 14:19persistent issue. in fact a growingly 14:22worse issue in the AI community. Vendors 14:25fundamentally are treating this as a 14:27solve for infrastructure and not a sol 14:29for architecture. And so bigger windows 14:32and better embeddings and cross chat 14:34search scale, but they don't solve 14:36structurally. And users keep expecting 14:38passive solutions because they're 14:40frankly sold passive solutions. There's 14:42an expectations issue here. Just 14:44remember what matters is not something 14:48that you can expect to work. But we're 14:50told that it will work. So if memory 14:52requires architecture and users want 14:54magic, the gap between what's promised, 14:56what's delivered, and what's needed has 14:58never been bigger. We have a memory wall 15:00of our own beyond the chip level in how 15:02we design our systems. And it won't get 15:04solved if we solve the wrong. So let's 15:06say you've gone through all of this and 15:08you want to solve memory correctly. I am 15:11going to give you principles that work 15:14whether you are using the chat and a 15:17sort of power user at home and you want 15:19to build something yourself because this 15:21absolutely works for that or whether you 15:23are designing larger systems because it 15:25turns out that the principles for memory 15:28are fractal because the problem is 15:30fractal. We have the same kinds of 15:33memory issues when we are power users 15:35individually in a chat as we do when we 15:37are designing agentic systems. So the 15:40principles that work. Number one, 15:41there's going to be eight of these. 15:42Settle in. It's going to be fun. Memory 15:45is an architecture. Memory is an 15:47architecture. It is not a feature. You 15:49cannot wait for vendors to solve this. I 15:50think you get this idea. We won't spend 15:52too long here. Every tool will have 15:54memory capabilities, but if you leave it 15:56to tools, they will solve different 15:58slices. You need principles that work 16:00across all of them. And you need to 16:02architect memory as a standalone that 16:05works across your whole tool set. 16:07Principle two, you should separate by 16:10life cycle, not by convenience. So as an 16:15example, you need to separate personal 16:17preferences which can be permanent from 16:19project facts which can be temporary and 16:22those should be separated from session 16:24state which can be ephemeral or 16:25conversation state. Mixing different 16:29life cycle states mixing permanent with 16:31temporary with ephemeral it just breaks 16:33memory. The discipline lies in keeping 16:35these apart cleanly. And again, this 16:39works if you're in chat. It works if 16:41you're designing a gentic systems. If 16:43you have a permanent personal 16:44preference, it is possible. It is as 16:46simple as a very disciplined system chat 16:50update where you go into the sort of 16:52system rules and the system prompt for 16:53chat GPT and you say, "This is what you 16:56need to know about me. These are my 16:57personal preferences." And model makers 16:58are starting to make that more exposed 17:00because they want that. But they don't 17:02tell you how to use that properly. And 17:04when I observe how people actually use 17:06that tell me about yourself, it is 17:08absolutely a mix of personal preferences 17:10and ephemeral stuff and project facts 17:13because no one has taught them to use it 17:14better. And if you're designing agentic 17:16systems, it gets more complex, but it's 17:17the same separation of concerns. You 17:19have to separate out what are the 17:21permanent facts in the situation here. 17:23What are project specific facts and what 17:25is session state. Principle number 17:27three, you need to match storage to 17:30query pattern. So that means you're 17:32going to need multiple stores because 17:33different questions require different 17:34retrieval. Now in the chat situation 17:36that I gave you, chat GPT can retrieve 17:40the memory if it's a system prompt kind 17:43of a thing and it just calls it into the 17:45context window and it's super simple and 17:47you'd never think of it as memory for 17:49most people but that's what it is. If 17:51you're designing an agentic system, it 17:53is understanding the difference between, 17:56for example, what is my style, which 17:59could be a key value because it's a 18:01written style of some sort. What is the 18:03client ID, which should be structured 18:05data or relational data, what similar 18:07work have we done, which could be 18:09semantic or vector storage data, and 18:11what did we do last time, which should 18:13be event logs. Those are four different 18:15types of data, right? You have key value 18:16data, structured data, semantic data, 18:18event logs. Trying to do all of these in 18:20one storage pattern is going to fail. 18:23And that is why when people say, "We 18:24have our data lake and it's going to be 18:25a rag." I'm like, why? Why is it going 18:28to be a rag? Have you heard the word rag 18:30repeated a hundred times like a magic 18:32spell for memory? It does not work that 18:34way. You need to match storage to the 18:36query pattern. Otherwise, you just have 18:38a very expensive data dump. Principle 18:40number four, mode aare context beats 18:43volume hands down. And so more context 18:47is not better context. Planning 18:49conversations need breadth like they 18:52need to have space for alternatives. 18:53They need to have space for comparables. 18:55Brainstorming conversations are similar 18:57to planning conversations. You need to 18:58be able to range. Execution 19:00conversations. Execution workflows in 19:03agentic situations. They need precision. 19:05They need precise constraints. Retrieval 19:08strategy needs to match your task type. 19:13You cannot just sit there and think to 19:16yourself, okay, I'm going to have a 19:17brainstorming conversation and it's 19:19going to be incredibly precise and just 19:21hope that it works. This is why I talk 19:23about prompting so much. Effectively, 19:25what prompting is doing? It is giving 19:27context that is mode aare to an AI so 19:32that it can be in the right mode. And 19:33that's super effective for chat users. 19:35But guess what? If you're designing 19:36agentic systems, it is your 19:38responsibility to architect mode 19:40awareness into the system so that it is 19:42aware that this is an execution 19:45environment and that precision matters 19:46and that it is audited and eval on 19:48precision. Principle number five, you 19:50need to build portable as a first class 19:53object. You need to build portable and 19:55not platform dependent. Your memory 19:57layer needs to survive vendor changes. 19:59It needs to survive tool changes. It 20:01needs to survive model changes. If chat 20:03GPT changes their pricing, if Claude 20:05adds a feature, your context library 20:08should be retrievable regardless. And 20:11that is something that almost nobody can 20:13say right now. And the people who are 20:15doing it tend to be designing very large 20:17scale agentic AI systems at the 20:19enterprise level. But this is a lesson 20:22that we all need to take with us. I 20:24think it is a best practice. It is sort 20:26of like keeping a go bag next to the 20:29door in case you need to get out in case 20:31of I don't know something happens to 20:33your house. You need to have something 20:34that is portable that carries relevant 20:38memory that you can use to have 20:40productive conversations with another 20:42AI. I fully admit there is not an 20:44outof-box solution for this. There are 20:46people who are power users who configure 20:49obsidian to do this right as a 20:51note-taking app and they tie it into AI 20:53and it becomes a portable platform 20:55independent way of handling this. There 20:57are people who use notion for this. The 20:59thing that is a common trait is that 21:01they are obsessed with making sure the 21:03memor is configured correctly for them 21:06and the AI has to come in and be queried 21:09correctly or called correctly to engage 21:11with a piece of the memory that matters. 21:13Whether that is a key value piece like 21:15what's my style or a semantic search 21:18like what similar work have we done 21:19together. A good data structure accounts 21:23for that. Principle number six 21:25compression is curation. Do not upload 21:2940 pages hoping the AI extracts what 21:31matters. I see people do this when they 21:33overload the context window and they ask 21:35for an analysis of a report. You need to 21:37do the compression work. You need to 21:39either in a separate LM call or in your 21:42own work, write the brief, identify the 21:44key facts that matter and state the 21:46constraints. This is where judgment 21:48lives. And if you don't delegate it, you 21:51will be happier with the precision and 21:54context awareness of the response. 21:56Memory is bound up in how we humans 21:58touch the work. There are ways to use AI 22:01to amplify and expand your judgment. You 22:03can use a precise prompt to extract 22:07information in a structured way from 40 22:09pages of data and then in a separate 22:11sort of piece of work figure out what to 22:13do with that data. But it remains on you 22:17to make sure that the facts are correct, 22:19that the constraints are real, and that 22:21the precision work you're asking AI to 22:23do with that data is the correct 22:25precision work. The judgment in 22:28compression is human judgment. It may be 22:30human judgment that you amplify with AI, 22:32but it remains human judgment. Principle 22:35number seven, retrieval needs 22:39verification. So semantic search will 22:42recall well but fail on specifics, 22:45right? It will recall topics and themes. 22:47Well, you need to pair fuzzy retrieval 22:50techniques like rag search with exact 22:53verification where facts must be 22:55correct. You should have a two-stage 22:57retrieval call path, right? Recall 22:59candidates and then verify against some 23:01kind of ground truth. This is especially 23:02important in situations where you have 23:04policy or you have financial facts or 23:07legal facts that you need to validate. 23:09Something like this is exactly why there 23:13was a very prominent fine leveled 23:16against a major consultant firm in the 23:18last two weeks. I think the fine came to 23:20close to half a million dollars because 23:22they could not verify facts around court 23:25cases in a document that they prepared 23:28and they hallucinated them and they 23:29didn't catch them. retrieval failed. 23:32Retrieval failed. And because the LLM is 23:34designed to keep the conversation going, 23:36it just inserted something plausible and 23:38nobody caught it. You need to be able to 23:40verify retrieval against ground truth. 23:43Now, if it's a small task, that might be 23:45the human at the other end of the chat, 23:47right? It just is a step that needs 23:48doing. If it's a large agentic system, 23:50it is the exact same fractal principle, 23:52but you need to do it in an automatic 23:53way using an AI agent for evals. 23:56Principle number eight, memory compounds 23:59through structure. So random 24:01accumulation actually does not compound. 24:03It just creates noise. Just adding stuff 24:05doesn't compound. If if we just added 24:07memories randomly the way we experience 24:09them in life and we had no lossiness, no 24:11forgetting ability, we would not be able 24:14to function as people. Forgetting is a 24:16technology for us. In the same way that 24:18forgetting is a technology for us, 24:20structured memory is a technology for 24:22LLM systems. So evergreen context goes 24:25one place, version prompts go another 24:26place, tagged exemplars go another 24:28place. And at a small scale, yes, you 24:30can do this. People are doing this with 24:31Obsidian, with notion, with other 24:33systems as individuals. And yes, you can 24:35scale this as a business. Same same 24:37principle. You let each interaction 24:39build without degradation if you have 24:42structured memory. Otherwise, you just 24:46have random accumulation. Otherwise, you 24:48have the pile of transcripts you never 24:49got to, and you're like, well, this is 24:50data. We're logging it. it's probably 24:52good. It just it's going to be random 24:54accumulation. It creates noise. You're 24:56not going to have structured memory. 24:58These are the principles that work. They 25:00work whether you are a power user with 25:02chat GPT or a developer building agentic 25:05systems. Frankly, they are guideposts 25:07for you if you are evaluating vendors in 25:09the memory space. These are tool 25:11agnostic principles. They're designed to 25:13scale with complexity and they're 25:15designed to give you keys that solve the 25:17memory problem because they make consist 25:20context persist reliably without the 25:24brittleleness that we see with current 25:26AI systems. And so my challenge to you 25:28as we wrap up this video, we've gone 25:30through root causes. We've gone through 25:31why memory is a hard problem. We've gone 25:33through eight principles for how to 25:36solve for this memory issue. Please take 25:39memory seriously. The reason it matters 25:42now is because if you solve memory now, 25:47you have an agentic AI edge. These 25:50systems are going to get cheaper and 25:52more powerful, but you can't assume 25:53they're magically going to solve for 25:55memory. As I said at the beginning, 25:56there's a chip level issue here. It is a 25:58hard hard problem. If they don't 26:01magically solve for it, if you take 26:02responsibility for memory and build it 26:04yourself in the way that works for you, 26:06you are starting the timer earlier than 26:08everybody else around you on getting 26:11memory that is functional across a 26:13long-term engagement with AI. Because 26:15you have to start to think, we're in 26:17year two of the AI revolution. Wouldn't 26:19it be great to have memory that goes 26:21back to the year two when you are 26:23working with AI systems in 10 years, in 26:2615 years, in 20 years? Everybody else is 26:28going to have memory that started much 26:29later and they're going to lose that 26:32discipline, that acceleration, that 26:35ability to manage deep work over time 26:38that AI is going to be capable of with 26:41proper memory structures. So there is a 26:43moment here for you to think about and 26:47put in place a memory structure that 26:49works. Don't lose the opportunity. This 26:51is a this is a complex one, but it's on 26:54you and me and all of us together to 26:57build memory systems that handle our own 27:00needs, whether that's personal needs or 27:02professional needs. I know you can do 27:03it. Drop in the comments how you're 27:05doing it because I think we should all 27:06crowdsource