Learning Library

← Back to Library

Balancing Human Control in AI Chatbots

Key Points

  • Generative AI dramatically accelerates chatbot development by letting large language models handle response generation, reducing the manual effort previously required for crafting conversational flows.
  • Traditional chatbots relied on intent classifiers trained with numerous examples, giving developers strict control over answers but struggling to scale beyond frequently asked questions.
  • As the variety of user queries expands, classifier‑based bots hit a point of diminishing returns, leading to misunderstandings and poor user experiences.
  • Retrieval‑augmented generation with LLMs eliminates the need for extensive classifier training, offering a more flexible way to answer both common and rare queries while raising new considerations about balancing human oversight and automated control.

Full Transcript

# Balancing Human Control in AI Chatbots **Source:** [https://www.youtube.com/watch?v=DpD8QB-6Pc8](https://www.youtube.com/watch?v=DpD8QB-6Pc8) **Duration:** 00:06:23 ## Summary - Generative AI dramatically accelerates chatbot development by letting large language models handle response generation, reducing the manual effort previously required for crafting conversational flows. - Traditional chatbots relied on intent classifiers trained with numerous examples, giving developers strict control over answers but struggling to scale beyond frequently asked questions. - As the variety of user queries expands, classifier‑based bots hit a point of diminishing returns, leading to misunderstandings and poor user experiences. - Retrieval‑augmented generation with LLMs eliminates the need for extensive classifier training, offering a more flexible way to answer both common and rare queries while raising new considerations about balancing human oversight and automated control. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DpD8QB-6Pc8&t=0s) **Balancing Human Control with LLM Chatbots** - The speaker contrasts traditional intent‑classifier chatbots—requiring painstakingly crafted responses and strict control—with modern generative‑AI chatbots that automate answer creation, and explores how to strike an effective balance between human oversight and LLM‑driven automation. - [00:03:01](https://www.youtube.com/watch?v=DpD8QB-6Pc8&t=181s) **RAG Overcomes Chatbot Diminishing Returns** - The passage explains how excessive fine‑tuning harms chatbot accuracy and introduces Retrieval‑Augmented Generation as a simple two‑step approach—searching a document repository and using an LLM to generate answers—that reliably handles both common and rare user queries. - [00:06:08](https://www.youtube.com/watch?v=DpD8QB-6Pc8&t=368s) **Balancing Cache Speed and RAGs** - Leveraging fast cache access while finding the right mix between fully generated replies and generative AI‑powered retrieval‑augmented generation helps create high‑performance, user‑delighting conversational AI quickly. ## Full Transcript
0:00Generative AI makes it faster than ever before to create chatbots. 0:03It used to take a lot of manual effort to craft conversational responses and flows, 0:08but now, LLMs promise to cut some of the time and effort out of the build process by doing more of the work for us. 0:14If we say goodbye to handcrafted answers, is that a good trade-off? 0:17Today we'll discuss how to balance human and LLM control while building effective chatbots. 0:23Let's look at how we used to build chatbots before generative AI. 0:26We used to train chatbots to understand natural language through classifiers. 0:31So we might have something like. 0:33What time are you open? 0:39And we would craft an hour's intent for this. 0:43Classifiers are trained on examples, so we'd give multiple examples for this intent. 0:47So maybe I would do when do you close? 0:53And again, same intent. 0:56and for this intent, I'd want to very carefully control the answer, and I would say something like We're open 8 a.m. to 8 p.m. every day. 1:05I want people to really know what the answer is to that hour's question. 1:10I would also train additional intent. 1:12So maybe I would have one like, 1:14How do I open an account? 1:22This will go to an account intent. 1:27And again, I would very carefully control what happens from that. 1:30Maybe I would open the account for them, maybe I would give them some steps to follow. 1:35But again, the point is I had a very strict control over what happened. 1:39When I got this intent, no matter how it was asked. 1:41Let's imagine how how this training kind of scaled out. 1:46So I could plot a curve 1:49For the kinds of questions my chat bot received. 1:52I would use. 1:53The number of times I get each question on one axis. 1:57I would do the frequency of the questions on the other. 2:02And when I do that... 2:03It tends to look something like this. 2:07So I've got a nice long curve here. 2:09And at the top of the curve is the questions I get all the time, so I probably get that hours kind of question the most. 2:17And then accounts, 2:20I probably get that a lot, maybe not quite as much. 2:22So let's say it's it's here, right? 2:24It's still a very frequently asked question. 2:26It's just that it's not the most frequently asked. 2:29And then I'm gonna have a nice long curve of questions here and there'll be questions that I hardly ever get. 2:35Like how do I use my gold card while traveling overseas? 2:47Maybe I only get this question one time. 2:50And as I progress through this curve, it gets harder and harder to train the classifier to understand these things. 2:57There's actually a point. 2:59Somewhere along the curve. 3:00We'll call it here. 3:02Where you've gone past the point of diminishing returns. 3:06It gets so hard to train and tense that your chat bot starts not understanding. 3:10And you end up seeing a lot of responses like... 3:13Hey, I didn't understand that, or it starts answering the wrong question and just starts looking confused. 3:18And this is a really poor experience for your users. 3:21And so this is where a generative AI comes in. 3:24Using retrievable augmented generation, we don't need to train any classifiers. 3:29As long as the answer to our user's questions in the document repository used by the system, 3:34the RAG system can answer any question. 3:37The training is very generalized. 3:39So the process would look something like this. 3:42We've got a user here, there asking questions to the chat bot. 3:47The chat bot is sending that question to a document repository, 3:58that repository retrieves some documents that help, 4:03and the bot sends those documents and the question 4:10to the LLM. 4:11The LLM summarizes the answer. 4:14So it's a very generalized process. 4:17And we have a user question converted into a search query. 4:20The query returns some documents, 4:23and then those retrieved or returned documents are augmented by the LLM in a generated answer. 4:30And so with this pattern, the LLM can answer both the very frequent 4:34and the infrequent questions. 4:37And there's a real beautiful simplicity in this pattern. 4:40There's only two configuration points. 4:43So number one. 4:45You have the tuning of the search, the query, the retrieval process. 4:49And number two. 4:51You have the tuning of the answer generation process. 4:54Two points, very simple, very generalized, no intents, 4:57but you lose some degree of control. 4:59Remember that when people ask me when my store is open, I had a really particular answer in mind. 5:06I wanted to give I want to make sure they got exactly this text and in this pattern I don't have that exact control anymore. 5:12The LLM can't give me that guarantee. 5:15So what's the answer? 5:16It's a hybrid approach. 5:18So we're going to use a traditional classifier part of the time. 5:22And we're gonna use rag the other half of the time. 5:25So if I draw this out. 5:27It starts out the same. 5:28My user's asking a question to the bot. 5:31But the bot's making a decision. 5:33Is this a question I see all the time? 5:35In which case I'm going to go with my intents. 5:38Slash the curated responses, 5:42and if this is something I don't get that much, 5:44I'm going to go with the rag pattern that I've shown up here. 5:47And if we look at our original long tail curve, we can think of this. 5:51Left hand side 5:53as a kind of a cache. 5:56Those questions we get all the time, you can pull right out of it, 6:00right out of its internal memory. 6:02We're not going to the LLM, we're not doing any of these searches, we're now worrying about tokens and inference time and all those things. 6:09So using this side of the cache is very quick. 6:12So find the balance between fully generated conversational responses and generative AI powered rags. 6:18This will help you build effective conversational AI that delights your users as quickly as possible.