Learning Library

← Back to Library

Decision Agents Require Non-LLM Solutions

Key Points

  • Decision agents are crucial for autonomous, complex problem‑solving in agentic AI, but they must be built with technologies other than large language models (LLMs).
  • LLMs are unsuitable for decision agents because they are inconsistent, opaque, prone to fabricating explanations, and struggle to incorporate structured historical data.
  • Instead, established decision platforms or business rules management systems should be used, providing reliable automation that has been proven in industry for years.
  • These rule‑based systems deliver ruthless consistency, full transparency, and precise control over how each decision is made, ensuring every customer receives the same treatment every time.

Full Transcript

# Decision Agents Require Non-LLM Solutions **Source:** [https://www.youtube.com/watch?v=mRkJTXDromw](https://www.youtube.com/watch?v=mRkJTXDromw) **Duration:** 00:24:30 ## Summary - Decision agents are crucial for autonomous, complex problem‑solving in agentic AI, but they must be built with technologies other than large language models (LLMs). - LLMs are unsuitable for decision agents because they are inconsistent, opaque, prone to fabricating explanations, and struggle to incorporate structured historical data. - Instead, established decision platforms or business rules management systems should be used, providing reliable automation that has been proven in industry for years. - These rule‑based systems deliver ruthless consistency, full transparency, and precise control over how each decision is made, ensuring every customer receives the same treatment every time. ## Sections - [00:00:00](https://www.youtube.com/watch?v=mRkJTXDromw&t=0s) **Limitations of LLMs for Decision Agents** - The passage argues that large language models are unsuitable as autonomous decision agents because of their inconsistency, lack of transparency, and inability to reliably explain their choices. - [00:03:27](https://www.youtube.com/watch?v=mRkJTXDromw&t=207s) **Challenges of LLMs in Decision Platforms** - The speaker explains that effective decision agents require low‑code tools, expert‑in‑the‑loop management, and embedded analytics, yet large language models hinder this by being inconsistent, opaque, and hard to justify to customers or regulators. - [00:08:08](https://www.youtube.com/watch?v=mRkJTXDromw&t=488s) **Decision Platforms for Stateless Agents** - The speaker outlines how a business rules management system—featuring IDE and low‑code editors tied to a version‑controlled repository—enables the collaborative creation of stateless, side‑effect‑free decision agents. - [00:12:26](https://www.youtube.com/watch?v=mRkJTXDromw&t=746s) **Packaging Decision Rules as Agents** - The speaker explains how to expose rule‑based decision services via an NCP server, wrap them as agents, and manage dynamic updates without disrupting ongoing transactions. - [00:22:12](https://www.youtube.com/watch?v=mRkJTXDromw&t=1332s) **A/B Testing Decision Rule Versions** - The speaker describes storing multiple rule versions in a repository and using a decision agent to run A/B (champion‑challenger) tests across user groups, analyzing logs to select the better set, while noting that real‑time feedback loops are impractical for loan origination because repayment performance can only be assessed over an extended period. ## Full Transcript
0:00So decision agents are an essential component in agentic AI if you're going to solve large, 0:06complex problems. The challenge is that if you have a complex decision that you need to make 0:11autonomously, and if you're building agentic AI, you're going to need decisions to be made 0:15autonomously. Ah, but these decisions are not a great fit for large language models. And large language 0:21models, of course, are the sort of key technology in the agentic AI, but they're not a good fit for 0:25decision agents. So you need to build decision agents in your agentic framework, but you need to 0:31use a technology other than large language models. So why aren't large language models a good 0:37fit? Well, let's think about some of the things they are famous for. They're famous for 0:40inconsistency. They might do the same thing every day and then suddenly one day do something 0:44different. Well, that's not great. If you're trying to make a decision, you really need people to be 0:47treated the same and not vary day to day, minute by minute, just because the LLM feels like doing 0:52something different. They are notoriously black box. They are very bad at describing why they did 0:58something. And it turns out often you need to explain to people why you made a certain decision, 1:03why they didn't get the job, why they didn't get the loan. And so you need some transparency in all 1:07of this. And large language models are not good at that, even when you ask them to explain themselves. 1:12They have a little bit of a reputation for lying about how they decided what they decided. And then 1:17the the final one is that in many business decisions, there's a there's history. You have a 1:23database of information that tells you what you should do. What might be fraudulent might be 1:29problematic. You need to be able to process that data and turn it into analytic insight. And large 1:34language models are not very good at that either. So for all of these reasons, you're just not going 1:37to use a large language model to build a decision agent. So what you can do is you're going to use a 1:42decision platform or a business rules management system for this. And let's just sort of reiterate 1:47some of the sort of key value propositions for this kind of technology. This is well-established 1:51automation technology that's been in use for a long time. And so we know what the benefits are. So 1:56let's think about what some of those benefits might be. So what are our requirements for a 2:00decision that we're going to get out of using one of these platforms? So first and foremost, we're 2:04going to get consistency. So if I use one of these platforms, it's going to make the same decision 2:10the same way every time. It's going to give me complete control over exactly how the decision is 2:14made, and once I've defined it, Every customer who gets that decision made against them is going to get 2:19it made the same way. So I get ruthless consistency. The second thing they're really good 2:23at is transparency. Not only do I have a formal definition of how this works and what the 2:30steps, they're, the rules are that I'm following. I can explain that to someone. I can show that to 2:35someone, and I can log it. So I can have a complete transparent log of exactly how this decision was 2:40made. I have this transparency that I need for a decision. The third thing they give me is 2:45agility. Now agility is important because the way I make a decision is subject to change without 2:51notice. Competitors change their behavior. The market changes. The regulations change. There's a 2:55court case. There's all sorts of drivers changing the behavior of the way you make a decision. And 3:02if you can't do that quickly, if you have to wait for there to be new data or new documents, or you 3:07have to retrain something that's going to take too long. So you have to be able to respond more 3:11actively, more quickly. The other thing about decisions is that there's a tremendous amount of 3:17domain knowledge in them. So programmers often find it really, really hard to correctly build 3:23decision agents. And so, you really want to be able to engage people who have the domain knowledge. 3:27That means you're going to need some kind of low-code environment. You're going to need some way to 3:31engage customers, engage, sort of, experts, I should say, in managing the behavior of a decision 3:38agent while still being able to manage it as a programing, uh, programmatic component in your agentic 3:43AI framework. So you need some kind of low-code environment. And then lastly, you need this way of 3:48embedding analytics that we were talking about, where I can take analytic insight, I can turn 3:52historical data into analytic insight. And I can embed that analytic insight in my decision so 3:57that I can make the decision more precise, more accurate, more analytically precise. Now, all of 4:03these are sort of classic benefits of using a decision platform. But let's just reiterate why LLM 4:08is a, is a, tough call in these things. If I have an LLM, well it's not really consistent. It's hard 4:15to make an LLM do the thing. This is a feature, not a bug. That variation, that randomness is part of 4:21what makes them so powerful. it's very hard for them to be consistent. Uh, they're definitely not 4:27transparent, right? They're very opaque about how they did things. Even attempts to get them to 4:32explain themselves are problematic. And if I go to a customer or a regulator and say, hey, I have this 4:37black box that's been explained by this other black box, that doesn't really induce confidence. 4:42They are, actually, they can be hard to change. Their behavior is, uh, you know, easy to get it set 4:48up. You don't have to, like, code it. You just, you know, provide information to it. But it's then hard 4:53to change without re, you know, presenting new data to it, retraining it with new data. You can't just 4:57like, tell it to stop doing X, stop doing Y. If you've watched some of the news around attempts 5:02to block particular agents or make agents behave in a certain way. If you try and like just code 5:08something in quickly, you get very, very strange behavior. Um, they're quite complex. They require 5:14quite AI-like level skills to build them and manage them. And as I said, they're just no good at 5:19structured data. They're not good at building predictive models out of historical data, and 5:24using that historical data to improve the precision of your decision-making. They're good at 5:28reading documents and text. They're not good at structured data. So we're not going to use a large 5:32language model. We're going to have to use something else. So what are we going to use 5:35instead? What technology can we use to do it? So let's go back to the scenario I talked about in 5:40the previous video, where I talked about a bank that needed to lend money. Can't write bank today. 5:46Needed to lend money to a person. So I wanna lend you money. And to do that, I have an agentic AI framework 5:52that manages that whole complexity. And as part of that, I have two agents. I have two decision agents. I 5:59identify one was an eligibility agent to say, are you eligible for a loan? And then 6:05another one which was to say, can I actually lend you the money? Which is sort of, uh, what banks call 6:12origination? Uh, you want to borrow this amount of money for this actual thing? Can I lend it to you? 6:18If so, what's the rate? What's the price of this? So I have these two decision agents. Now, we've been 6:22building these kind of autonomous agents using decision technologies for a really long time. So, 6:27um, there's a couple of things that need to be true of a decision agent. First of all, they need 6:31to be stateless and side effect-free. So what does stateless mean? It means that you want them just 6:36to respond to whatever data they're given at the moment they're given the data. Here's the data, 6:40here's the decision. Here's the data, here's the decision. Don't remember the states. That's why we 6:44had, if you remember, a workflow agent whose job it was to remember the state. So the workflow tracks 6:50the state, and it gathers the data for us that we need, and it passes that back and forth to 6:56these agents. So it says, okay, at this point in the process, I need I've got this set of data about 7:01this person, about this application, about this loan. Are they eligible? Yes or no. And you get an 7:07answer back. And similarly with the origination decision. So they're managing the state. They're 7:11managing all of that. And that, uh, scales better. It, uh, keeps the decision agents simpler. Makes it 7:17much easier to check that you're not using things like personal information or health information 7:22inside the decision when you don't need to. So it's just a much cleaner interface. But why side 7:28effect-free? Why is it important that your decision agents don't do anything, they just make 7:32decisions? Well, you want to be able to reuse them. Let's think about eligibility. Well, I might be 7:37using it in the context of a workflow for originating a loan. That's one use case for it. But 7:41I might have other processes, other workflows that do other things that send you letters or that, you 7:47know, um, tell a call center wrap or put you into a marketing campaign and so on. So I still need to 7:53know if you're eligible, but I'm going to do something completely different if you are 7:57eligible. And so by separating that, by not having the action be part of the decision agent, I get to 8:02reuse it in lots of different circumstances. So I have these stateless, side effect-free agents. Okay. 8:08So how do I build one of these? What does that look like? What technology do I need to build a 8:13stateless, side effect-free decision agent that has these characteristics? Well, we use what's 8:19called a business rules management system or decision platform. So decision platforms are 8:24software stacks designed to build, you know, historically speaking, decision services that can 8:30then be wrapped into decision agents. So what is a decision platform have? Well, it has a number of 8:35software components. First and foremost, it's got a couple of editors. It's got typically like an IDE 8:40or a technical editor and a low-code editor in which you can write logic, business rules, decision 8:46logic. So you can lay out the the actual rules, the logic that has to be followed to make a 8:52particular decision. And those two editors generally are then linked to a single repository. Now, 8:59this might be something I get, but it might also be a more managed repository so that you can 9:04have version control and branching and all those things that is specialized for business rules and 9:10decision technology and available to your low-code editor. this varies by platform, but they 9:15all have the concept of a repository in which you can do branching, the versioning and, and all the 9:20kinds of repository things you need to do to make sure you have a current version of the rules. And 9:24you can do development work and have multiple people working and all that good stuff. Now, once 9:29it's in this repository, and because it's a decision platform focused only on decision-making 9:34logic that is stateless and a side effect-free, you can do a lot more testing and validation of 9:40the logic, so you can validate that the logic is correct. So you can have often a set of tools that 9:45look at the rules that are in the repository and validate them. Is the logic complete? It's the 9:50other. Are you missing a criteria that you're not checking? Do you have overlapping ranges, all that 9:55kind of stuff? And it's much easier to check that in the context of a decision platform, because the 9:59logic is written in a more declarative, less programmatic way, and it's managed as a set of 10:05assets that can be checked. So you typically have a set of validation tools so that the logic you 10:10write is more robust. And then obviously you're going to need to test it. Now testing, um, testing 10:15tools can be as simple as the kind of JSON object. Pass it in, see if you get the result you're 10:21expecting UI that you would use, like with swagger or something like that but there can also be a 10:25lot more sophisticated. Some of the decision platforms have very robust test suites, where you 10:30can load up very large numbers of tests transactions, run them through the results, check 10:35expected results, confirm you've passed all the tests and so on, and do all of this in a low-code 10:40way so that your non-programmers who are writing, providing their domain expertise can also test it 10:46to make sure they haven't broken anything. Now, when it comes to decisions, testing is is 10:51necessary but not sufficient. Because within those decisions, within those business rules, there are 10:56going to be thresholds, places where you make choices as a business or as an organization as to 11:01what that threshold should be. There's not a hard, this is a good threshold. That's a bad threshold. 11:05There's a, it's going to make a difference. So take loans. How much am I willing to lend you to buy a 11:10boat? Well, that's a, there isn't a right answer and a wrong answer in the sense that I can't write a 11:15test case for it. But the business could change that threshold and it has an impact. I need to be 11:20able to track what that impact is. And so generally, we have some kind of impact tool that 11:27takes a bunch of historical data and loads it in and then runs a set of simulations. So very 11:33similar to a test engine, but with a different perspective. Instead of saying this broke, this 11:37didn't break, it says, here's the difference. If you make that change to that rule, the results look 11:43like this. And if you make this change to this rule, the results look like that. So you can see 11:47what the impact of a change is going to be before you make it. So a lot of these tools. Uh, yeah. You 11:52have to deploy and put a test version out before you can do these things, but several actually 11:57allow you to do things like testing and simulation on rules you haven't deployed yet that 12:01are just in your repository, and manage all of that essentially under the covers so that you can 12:05do it inside your development environment So they provide a lot of tools to make sure you have the 12:09rules correct before you deploy them. Now, once you have them correct, obviously you do, in fact, need 12:15to deploy them. So you've got a deployment engine that deploys them as a service. So now I've got my 12:21rules service deployed, my decision service deployed. And it's going to execute those rules. 12:26It's got the code and the engine that it needs to execute those rules. So when you pass in data, it's 12:31going to give you an answer. Now in this case obviously I'm going to expose it as an agent. So 12:36I've probably got some kind of NCP (model contact protocol) server that exposes these decision 12:41services as tools, you know, and then those can then be wrapped into an agent and 12:48exposed in my agent framework. So what is agentic? Uh, yeah. What these agents are going to do? 12:53The origination agent is going to say, here's my data packet. It's going to come in to my decision 12:58service, and I'm going to get a response back, which I then, you know, goes back to my agent. So I 13:02can quickly, uh, package up my rules as decision services. I can reuse rules and reuse logic and 13:09package it up in multiple services, deploy those services, wrap them as agents using MCP. And now 13:14I've got a whole series of decision agents that I'm managing from this repository. The technology 13:19is really good at doing things like I've made a rule change, update the engine, handling in-flight 13:24transactions so that an in-flight transaction doesn't get broken if you change the rules. All of 13:29that kind of, uh, constant update is all handled very effectively. So what this lets me do is it 13:34lets me build these rules, build these decision agents in a very robust way, and then deploy them 13:39as a service that I can then use to support my agentic framework by exposing them as agents. So 13:47yeah, this handles, if you like, most of what's going on in an agent. If you think about these agents, 13:53this is all very prescriptive. So this is really describing how I write rules. How do I 14:00describe the rules, the logic that prescribes how this decision is made. But many decisions have a 14:07probabilistic component too. So, you know, um, probabilistic, I probably spelt 14:14that wrong, but probabilistic elements too. So if it's likely that this is James 14:20will do one thing, and if it's not likely that it's James, if it's someone's impersonating James will 14:24do something different. If it is likely that this is a legitimate transaction, we'll do certain 14:29things. So these are probabilistic elements that are typically built using predictive analytics, 14:34machine learning from my historical data. So in an agentic framework, what does that look like? Well 14:40I typically what I'm going to do is I'm also going to deploy these machine learning components. 14:45So I might have a prediction, for instance, of fraud. How likely is it that this person is the 14:50person who's applying for this loan? I might have another one around, credit risk. How likely are 14:55they to pay us back? And I might have a third one, which is payoff risk. How likely are they to pay 15:00us off early? And all of these agents are used by my origination agent as part of the 15:06origination decision. So I need to be able to consume this. So how do I build those agents? Well, 15:12I'm gonna use a machine learning platform to do that. I'm gonna use machine learning technology to do 15:16it. And generally, with machine learning, you're gonna do some kind of analysis. And this might 15:22be, um, you know, supervised in the sense that there is a human user who 15:28is directing, directing it, or it might be unsupervised, where you're really just using the 15:33algorithm and letting it see what it finds out about your data, which, of course, means you've got 15:38to have data. So generally for machine learning, you have a lot of data so you have multiple 15:43databases that have to be sort of combined and merged and managed. And you're going to do 15:48something called feature engineering. So you're going to engineer a set of features. And features 15:52are, you know, predictive characteristics of one kind or another, things that seem interesting. They 15:58can be very simple. If you have a date of birth, they can come up with an age. They might classify 16:02something. I'm going to say which customer, which age range are you in less than 20, 20 to 30, 30 to 16:0840 and so on? Because the range seems important. But they can get quite sophisticated. They can say 16:12things like, how often have you been more than 30 days late in the last 180 days on a payment for a 16:19bill? Well, that has to be calculated from all this data. So there's a lot of work to not just merge 16:25this data, but calculate these features from it. And then I'm going to feed that data and my 16:29features that I've created into my analysis, run these machine learning algorithms, neural networks, 16:35regression models, decision tree analytics, all sorts of different analytic techniques to see if 16:40I can find patterns or classifications or make predictions based on the historical data that 16:46I've got. It can supervise. I'm telling it what I'm looking for. Can you me which 16:51features will predict that this person will pay off the loan early? And if it's unsupervised, I'm 16:57more looking for things like. Is there anything unusual in here? What counts as an unusual pattern 17:02of data? Because that might be indicative of a new kind of fraud, for instance. And so the supervised 17:08generally driven by a data scientist, by a machine learning engineer. The unsupervised ones, you know, 17:14generally, you know, being kicked off and allowed to do their own piece. And then I'm going to go 17:18ahead and deploy these as, um, as endpoints that can be consumed by these agents. Now, we used to do 17:24a lot of analytics in batch. So we would run these kinds of analyzes and then update the database 17:29with a bunch of scores. Today much more likely to deploy them as individual endpoints, individual 17:34REST endpoints that I can pass a JSON object to to score and get a result back. And obviously once I 17:39do that, once I have an endpoint, I can use MCP again, and I can deploy those as tools that I can 17:44make available to my analytic agent. I now have analytic agents talking to deployed endpoints. And 17:49those endpoints run essentially an algorithm that's been built from my historical data. So 17:54they're not analyzing the historical data at runtime. What they're doing is they're using the 17:58results of that analysis to say, okay, here's a formula that takes this data and calculates a 18:03payoff risk for this customer. So I can see how likely it is this customer is going to pay it off 18:07early and use that as part of my pricing. So I have all these analytic agents, they're deployed 18:13into my, into my into, you know, into my agentic AI framework. And then my decision service is going 18:18to consume the results of those, those predictions, those probabilistic models as part of how it 18:24makes the business decision to originate you or not. Now, these are two types of technologies, decision 18:30platforms and the machine learning platforms, these are quite separate from large language 18:33models. But that doesn't mean they can't be enhanced with large language models. And there's 18:36two areas in particular where we see a lot of work. One of them is this idea of a large language 18:42model for ingestion. if I've got documents. If I've got brochures, if I've got 18:48a recorded conversation, it doesn't matter how I've recorded a bunch of data. But large language 18:55models are really good at extracting the data I need from that. So if I've got an origination 18:59decision and it needs to know, for instance, details of the boat you want to borrow money 19:05about, and I've got a brochure about that boat, then I can ingest that using a large language 19:09model, feed it directly into my origination agent as input data. So this gives me tremendous 19:16opportunity for making it much, much easier to supply the data I need. Often these decision 19:21agents need a lot of data. And so being able to consume documents and turn it into data is very 19:26effective. The other place we've seen are really good um, uses is in explaining results. If you 19:31think about I invoke this decision agent, one of the things it's going to do is it's going to log 19:37how it made the decision. It's going to create essentially a detailed log of how it made the 19:41decision, how much detail goes in that is something that's up to you, but you can look at quite 19:46precisely how the decision was made. Which rules fired? How was the decision made? Now that looks 19:51great for you. It's great for long-term improvement. great for understanding how your 19:57engine worked, how your decision agent behaved. It's not necessarily great for explaining it to a 20:04human being, a call center rep or a customer. So one of the other use cases for LLMs is to take this 20:10log data and turn it into an explanation. So now I can explain how that decision was 20:17made. And I can ingest textual data that you give me. So I can use these LLMs to make it 20:23easier to interact with my decision agent, both in and out. Now there's one last step that I wanted 20:29to add, which is how do I make these things learn if I want them to learn, if I want them to get 20:34better over time? What does that look like? How do I, how do I, get my results to like, you know, have 20:39an upward trajectory? Well, there's a couple of things to say about that. It really varies 20:44depending on the kind of agent you have. Many of the analytic agents will learn on their own 20:49behalf. The unsupervised ones in particular, they'll take new data and continually sort of, you 20:55know, update themselves as new data comes in. They'll update themselves. So as you run them, they 20:59make predictions, they make scores and new data results in new scores. And so they constantly 21:04change their algorithms. Typically you have some guardrails on that. So they can't change too much 21:09without telling someone. But you allow them to essentially run experiments on their own data and 21:14experiment internally so that they evolve as predictions So those those kind of agents, agents 21:20built on unsupervised analytic techniques are inherently learning. But other kinds of analytic 21:25agents, uh, don't learn quite the same way. So you typically then have some data scientist who is 21:31looking at, you know, doing new analysis with new data and proposing a new model. So they might do 21:38this every month, every quarter, every week they review the data up until yesterday. They see what 21:44day has changed since the last time they built the model. They rerun the model and see if the 21:48algorithm is different or noticeably different. if it is, they typically will deploy a new 21:53endpoint. And that gets version to control just like any other code. So any analytic agent can 21:59learn. It might learn automatically, but it might also learn because the data scientists are 22:03responsible for keeping it up to date over time. But what about decision agents But decision 22:07agents don't really learn, Right? The whole point of a decision agent is that it's concrete, right? 22:12That it's got this hardened definition of how it behaves. And so you don't really want to have it 22:18like randomly changing its behavior. So there's a couple of things you can do. First of all, you can, 22:24in the rule repository, you can code multiple versions. So you can put in: here's the old 22:31version of the rules, here's the new version of the rules. And then put a rule in that says some 22:35people get one, some people get the other. And I get to experiment to see which one works better. 22:39So I can run what's called A/B or champion challenger testing by writing rules in my 22:44decision agent so that it looks like one agent to the outside world, but it's got these two versions 22:48that it's it's running comparisons for so I can learn, and then I can have somebody look at the 22:54log, see which one works better, and, you know, close the loop, adding more rules 23:01back into the rule repository. The other thing I can do is I can start to think about the 23:07overall agent and how the overall agent works. And this starts to get more involved. Because if you 23:13think about if I want to improve my origination agent, well, what does it mean to make better 23:18origination decisions? What that means is I lend money to people who pay me back, but they don't 23:23pay me back at once. Right? The whole point of a loan, as you might pay me back over many years. And 23:27so I can't really tell how good you're going to be at paying it back until some time passes. So I 23:33can't do a real-time feedback loop because it's nonsense, right? The idea that I'm going to find 23:37out in real time whether this was a good loan decision is just silly, right? So I have to be get 23:42a log of how I made the decision and log which scores and predictions I used and what version 23:47everything was and stored that in my log. And then I need to wait some period of time, and then 23:53somebody needs to come back and look at all this data and say, well, given this log data and the 23:58versions of the analytics that I use and the results I got out of this origination decision as 24:04processed through my workflow and actually do that analysis work. So that requires a process 24:10and structure that you can follow. You can do it with agentic AI, but you have to be a little bit 24:16more thoughtful about how you would do it. It's not enough just to rely on the individual agents 24:20to learn about their bit of the problem. Someone has to own the framework as a whole, the solution 24:26as a whole, and systematically learn from how well that works.