Learning Library

← Back to Library

Six Core Principles for Agentic AI

11m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

State‑preserving (or “stateful”) intelligence is essential for AI agents, because retaining context across interactions enables efficient, coherent behavior and eliminates the need to resend redundant tokens.
Good agentic architecture hinges on robust context engineering; the new OpenAI responses API exemplifies this by making context preservation a built‑in feature.
Since LLMs are fundamentally probabilistic, engineers must impose “bounded uncertainty” by creating deterministic wrappers—e.g., fixing temperature to zero and rigorously standardizing inputs—to guarantee repeatable outputs.
This shift requires a new evaluation mindset: teams need to monitor probabilistic metrics in production and treat deterministic guarantees as engineered layers atop the underlying stochastic core.

Sections

Full Transcript

# Six Core Principles for Agentic AI **Source:** [https://www.youtube.com/watch?v=kWeLc-Dda94](https://www.youtube.com/watch?v=kWeLc-Dda94) **Duration:** 00:11:38 ## Summary - State‑preserving (or “stateful”) intelligence is essential for AI agents, because retaining context across interactions enables efficient, coherent behavior and eliminates the need to resend redundant tokens. - Good agentic architecture hinges on robust context engineering; the new OpenAI responses API exemplifies this by making context preservation a built‑in feature. - Since LLMs are fundamentally probabilistic, engineers must impose “bounded uncertainty” by creating deterministic wrappers—e.g., fixing temperature to zero and rigorously standardizing inputs—to guarantee repeatable outputs. - This shift requires a new evaluation mindset: teams need to monitor probabilistic metrics in production and treat deterministic guarantees as engineered layers atop the underlying stochastic core. ## Sections - [00:00:00](https://www.youtube.com/watch?v=kWeLc-Dda94&t=0s) **Stateful Intelligence as Core Principle** - Emphasizes that AI agents must retain context across interactions, making stateful design essential for scalable, effective agentic systems. - [00:07:31](https://www.youtube.com/watch?v=kWeLc-Dda94&t=451s) **Beyond Binary System Health** - The speaker explains that in multi‑agent systems health isn’t simply “up” or “down” but spans many shades of gray, requiring nuanced measurement of agent handshakes, reasoning quality, and output degradation. - [00:10:45](https://www.youtube.com/watch?v=kWeLc-Dda94&t=645s) **Six Principles for Scalable AI** - The speaker outlines six essential design principles—stateful intelligence, bounded uncertainty, intelligent failure detection, capability‑based routing, decision‑quality health monitoring, comprehensive audit tracking, and continuous input validation—to build reliable, agentic AI systems that truly scale. ## Full Transcript

0:00We are going to talk about the six 0:02principles for AI systems that I wish 0:06everyone knew. I've been building 0:07Agentic systems. I've worked with teams 0:09building agentic systems. What I see 0:11people miss is what we're covering here. 0:14And these are principles you can scale 0:15regardless of whether you're building a 0:17system for a 100 agents or just one 0:20agent. Principle number one, you need 0:23stateful intelligence. In other words, 0:26context preservation is a core principle 0:29of good AI architectures. And that is so 0:32different from what we learned when we 0:34were learning how to build systems in 0:35the when we were taught stateless 0:38services matter which is a fancy way of 0:40saying you need to have a clean start 0:42with anything because it enables easy 0:43scaling. Now AI systems require context 0:47and learn behaviors and those disappear 0:49on a restart. That is why the new OpenAI 0:52responses API is stateful. It's why it 0:55preserves context that they deliberately 0:57added that in because it is so important 0:59to have context preservation as an 1:02architectural component of agentic 1:04workflows. And I wish that people 1:06understood that because everything else 1:09gets easier if you can retain context in 1:12a way that's intelligent. So much of 1:14good agentic architecture is just good 1:17context engineering and good context 1:19preservation. It's not rescending the 1:22same tokens. You don't you don't have to 1:23send again. That's wasteful. So the 1:26first principle is stateful 1:27intelligence. You want to make sure that 1:30your intelligent systems recognize and 1:32preserve state in ways that are 1:34meaningful to the agent. The second 1:37principle is bounded uncertainty. So 1:40we're used to deterministic systems. 1:42Traditional engineering has the same 1:44input with the same output and very 1:46predictable testing which is why most QA 1:49is before launch. The new model you have 1:52to bound uncertainty. And so you have to 1:55essentially put rappers that are as 1:58deterministic as possible on top of 2:00probabilistic cores. And so you have to 2:03do things like mess with the API and 2:05turn the temperature of the LLM down to 2:07zero and define your inputs extremely 2:09precisely in the same sequence every 2:12single time so that what you get back 2:14for order 1 2 3 when you make a query is 2:16always the same. That is a whole another 2:20level of engineering that we're we we're 2:21not used to, right? Like traditionally 2:24you didn't have to think about whether 2:25the program would respond to the same 2:28input with the same output every time. 2:29It just always did. you could worry 2:31about other things. We don't live in a 2:33deterministic world anymore. We have to 2:36engineer deterministic bridges on top of 2:40probabilistic cores. Our world is 2:43running on probabilistic cores now. And 2:45not enough people have sort of fully 2:47realized that we need to bound 2:50uncertainty and it's part of our 2:52fundamental role. This changes how 2:54engineers evaluate. Engineers need to 2:57spend a lot more time from a data 2:58science perspective understanding what 3:00probabilistic metrics look like in 3:03production, not just deterministic 3:04metrics. We also need to have much more 3:07investment in QA post production because 3:10QA needs to be able to measure events 3:13that are occurring in production 3:15pipelines that may not be in line with 3:18expectations, that may be edge cases, 3:20that may be things that break our 3:21expectations of what works and what 3:23doesn't. We need to move from an 3:25assumption that our world is just 3:27building these deterministic blocks to 3:29the assumption that we are working with 3:30probabilistic systems that need 3:33continued sustained operation after we 3:36launch. And so we will have to go back 3:38in and we will have to measure. We will 3:39have to pay attention. We will have to 3:42maintain not just because there are 3:44deterministic bugs, but because our job 3:46is to continue to bound uncertainty as 3:49models drift over time, as perhaps we 3:52get different inputs over time, as we 3:54have different models being swapped out, 3:56as context emerges and changes in our 3:58context structures. All of those things 4:01shift how our production systems behave 4:04in ways that weren't true when we were 4:05in a deterministic output world. That's 4:08number two. Number three, fail fast 4:11design. We assumed in the past that if 4:15something crashed on error and it had a 4:18clear failure mode, then we were doing 4:20our jobs because we could immediately 4:22kill that micros service and restart it 4:25and we didn't have to have it sort of 4:27slowly dragging down the rest of the 4:29system. Now we actually have failures 4:31that are harder to detect. AI can fail 4:34by hallucinating. AI can fail by 4:36drifting. It can still be functional but 4:38be completely wrong. This is not a 4:41failure mode we're used to. We need 4:43intelligent failure detection. We need 4:45the ability to monitor reasoning 4:47quality, not just system health. And how 4:50you measure that is going to depend on 4:52the kind of inference you want to build 4:54into your agentic system. But you got to 4:56measure it. You've got to be able to 4:59detect failures that occur that are not 5:02just catastrophic program didn't want 5:04failures. And you have to build your 5:06system not from the perspective of what 5:09would happen if the whole thing went 5:11down, but from the perspective of how 5:13can the system work well if it's 5:15difficult to detect a degradation in 5:18reasoning quality. You need to assume 5:21that you are moving from a fail fast 5:25world to a subtle failure world where 5:28the failure is going to be hard to 5:29detect. And so you need to think a lot 5:32about how you monitor quality in that 5:34world. And that's another huge shift for 5:35engineers because they're used to very 5:37simple failure modes and making sure 5:39that they have like a really clean break 5:42and there's no massive dependencies if 5:43something goes down and they could bring 5:45it back up. Not anymore. You can have 5:47things that are running in production 5:48that look successful by most 5:50deterministic metrics that still don't 5:52work. Next principle, this is number 5:54four. Uniform load distribution is the 5:56old way of doing things. We had 5:58identical nodes. They handle identical 5:59requests. They've built out systems like 6:01this and different requests all get fed 6:04the same way because basically no matter 6:07sort of what device you're on, you're 6:08getting the same experience. That is not 6:11true anymore. Different requests to the 6:13system in an agentic system can mean 6:16dramatically different computes, 6:18hundreds of multiples of different 6:20computes. A high inference compute 6:22request can be thousands and thousands 6:25and thousands of tokens. and you can 6:27serve a low inference compute request in 6:291/100th of the space like it's a very 6:31token efficient request. So you need to 6:34think not about identical nodes and how 6:37you uniformly distribute a load that you 6:39presume is consistent. Instead think 6:41about capability based routing. Think 6:43about how you rate route based on task 6:46complexity and the confidence that AI 6:48has in a particular problem space. Is AI 6:51going to have to burn a lot of tokens 6:53understanding their request? Is it 6:54complex? So we need to route it 6:56differently than if AI understands 6:58what's going on. It's a very low 6:59inference task. So again, we're we're 7:02changing the way we think. It's not 7:03about distributing massive consistent 7:06load evenly across a bunch of identical 7:09nodes. It's about understanding the 7:10differential capabilities of your nodes 7:13and making really really smart choices 7:15about where you put the capabilities you 7:17have. Where do you burn the tokens with 7:19a smarter model? It's a new thing, 7:21right? We have to think about routing 7:23differently. We have to think about 7:24routing intelligently. There are people 7:26who are building agentic systems with 7:27reasoners just to solve this problem. 7:29Principle number five is binary health 7:31state. So we're used to system up system 7:34down. Again I want to call out for you 7:36that it is not that simple in an agentic 7:39system. You can have system up system 7:41down and you can have a lot of complex 7:44states in between when you have a 7:45multi-agentic system. I talked earlier 7:48about the idea that you can have a fail 7:50fast design that gets to intelligent 7:52failure detection and how you have to 7:54understand reasoning quality. This is 7:55sort of the next level of that. In this 7:57case, it's a multi- aent system and you 8:00have to understand that the system can 8:02be up and partially functioning. The 8:04system can be up and not functioning. 8:06The system can be up and some of the 8:07handshakes between the agents aren't 8:09working, but you still can get most of 8:12the functionality there, but maybe 8:13there's degraded intelligence. What I'm 8:15saying is that you've moved from a black 8:17and white world to a world where there 8:19are lots and lots of shades of gray, 8:21maybe 50 shades of gray, and you have to 8:24figure out what to do with measurement, 8:26with quality, with system health when 8:29it's that complex. And the more agents 8:31you add to the system, the more complex 8:33it is to measure system health. And so 8:35you are the one that has to track the 8:37outputs coming through, the quality of 8:39those outputs, understand enough of the 8:41audit trace to understand where the 8:43agents are breaking handshakes or where 8:44the agent uh reasoning traces aren't 8:46working well or where there's 8:48degradation in outputs or maybe where 8:49there's a context drift somewhere to pin 8:51down what's going on. It is a much 8:53higher bar for auditability than it used 8:56to be. The last principle I want to call 8:58out is input validation. So before you 9:01validated once, right? If you were doing 9:03inputs, it was like we need to validate 9:05at the gateway. It goes into the micros 9:07service and we're done. So far so good. 9:09These days it's not the same. These days 9:11you have to validate throughout the 9:14conversation state. You have continuous 9:17validation. You you have to understand 9:19that AI behavior depends on accumulated 9:22context. And so you need to validate as 9:24you go or else you're going to not know 9:26where you are going off the tracks. And 9:28so you need to think of it almost as a 9:31continuous edge. You need to think of it 9:32as each turn in the conversation is 9:35potentially a step that requires some 9:37validation of conversation state. So you 9:39know there was a checkpoint there and it 9:41worked. Otherwise it's very very 9:42difficult to debug these systems. If all 9:44of this sounds difficult, it should. It 9:48is much much harder to design healthy 9:50agentic AI systems than it was to design 9:54traditional software. And we have had 9:56very very few conversations about how 9:59traditional engineering principles do 10:01not work in the age of AI. I hope that 10:04this breakdown has helped you to see how 10:07some of the traditional principles we 10:09have which are not entirely dead. If 10:10you're still building traditional 10:11software, these are still good 10:12principles and many of our systems are 10:14hybrid systems. And the post that I'm 10:16writing on this sort of goes into this 10:18in much more depth. But you sometimes 10:20have to build systems that are both 10:22traditional deterministic software 10:24systems and AI systems. In fact, most of 10:26the ones that I build end up being 10:28hybrids. And so you have to be smart 10:29enough to take traditional principles 10:31like stateless services where they 10:32matter for deterministic software and 10:35also take stateful intelligence which is 10:38the AI principle and apply that where it 10:40matters for designing agentic systems. 10:43We need new schools. We need new 10:45principles. We need new understanding of 10:47how AI systems truly scale. And I hope 10:50that this breakdown has helped you to 10:52get a sense of what some of those new 10:54principles are, of how to actually 10:56construct systems based on principles 10:58that scale, not just based on tactical 11:00tips. So there you go. Those are my six 11:02principles for AI. Stateful 11:03intelligence. Bound that uncertainty. 11:06Make sure you have really intelligent 11:08failure detection. Assume you're going 11:09to have to route based on capability, 11:11not just uniform load distribution. Make 11:14sure that you have decision quality and 11:17reasoning pattern health states across 11:20multi-agentic systems. You're going to 11:21have to have very detailed audit 11:23tracking and then assume you have to 11:25validate your inputs throughout the 11:26conversation and not just as an input. 11:29If you put all of those together, you 11:31are going to be much more likely to get 11:33an agentic system that actually works. 11:35Best of luck.