Learning Library

← Back to Library

Retrieval-Augmented Generation Enhances LLM Accuracy

6m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Large language models (LLMs) often give confident answers that can be factually incorrect, outdated, or lack supporting sources.
An anecdote about planetary moons illustrates two common LLM issues: no citation for the information and reliance on stale knowledge.
Retrieval‑Augmented Generation (RAG) addresses these problems by consulting an external content store—either open (e.g., the internet) or closed (e.g., curated documents)—to retrieve up‑to‑date, verifiable information.
By grounding responses in retrieved sources, RAG improves accuracy, reduces hallucinations, and ensures answers reflect the latest available data.

Sections

Full Transcript

# Retrieval-Augmented Generation Enhances LLM Accuracy **Source:** [https://www.youtube.com/watch?v=T-D1OfcDW1M](https://www.youtube.com/watch?v=T-D1OfcDW1M) **Duration:** 00:06:26 ## Summary - Large language models (LLMs) often give confident answers that can be factually incorrect, outdated, or lack supporting sources. - An anecdote about planetary moons illustrates two common LLM issues: no citation for the information and reliance on stale knowledge. - Retrieval‑Augmented Generation (RAG) addresses these problems by consulting an external content store—either open (e.g., the internet) or closed (e.g., curated documents)—to retrieve up‑to‑date, verifiable information. - By grounding responses in retrieved sources, RAG improves accuracy, reduces hallucinations, and ensures answers reflect the latest available data. ## Sections - [00:00:00](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=0s) **LLM Generation Lacks Sources** - Marina Danilevsky uses a personal anecdote to show that pure large‑language‑model answers can be confidently incorrect, outdated, and without citations, highlighting the need for retrieval‑augmented generation. - [00:03:05](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=185s) **Retrieval-Augmented Generation Explained** - The speaker outlines how a RAG framework lets an LLM first fetch relevant documents from a content store before answering, enabling up‑to‑date, evidence‑backed responses without retraining. - [00:06:20](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=380s) **Thank You and Subscribe Prompt** - The speaker thanks viewers for learning about RAG and urges them to like the video and subscribe to the channel. ## Full Transcript

0:00Large language models. They are everywhere. 0:02They get some things amazingly right 0:05and other things very interestingly wrong. 0:07My name is Marina Danilevsky. 0:09I am a Senior Research Scientist here at IBM Research. 0:12And I want to tell you about a framework to help large language models 0:16be more accurate and more up to date: 0:18Retrieval-Augmented Generation, or RAG. 0:22Let's just talk about the "Generation" part for a minute. 0:24So forget the "Retrieval-Augmented". 0:26So the generation, this refers to large language models, or LLMs, 0:31that generate text in response to a user query, referred to as a prompt. 0:36These models can have some undesirable behavior. 0:38I want to tell you an anecdote to illustrate this. 0:41So my kids, they recently asked me this question: 0:44"In our solar system, what planet has the most moons?" 0:48And my response was, “Oh, that's really great that you're asking this question. I loved space when I was your age.” 0:55Of course, that was like 30 years ago. 0:58But I know this! I read an article 1:00and the article said that it was Jupiter and 88 moons. So that's the answer. 1:06Now, actually, there's a couple of things wrong with my answer. 1:10First of all, I have no source to support what I'm saying. 1:14So even though I confidently said “I read an article, I know the answer!”, I'm not sourcing it. 1:18I'm giving the answer off the top of my head. 1:20And also, I actually haven't kept up with this for awhile, and my answer is out of date. 1:26So we have two problems here. One is no source. And the second problem is that I am out of date. 1:35And these, in fact, are two behaviors that are often observed as problematic 1:41when interacting with large language models. They’re LLM challenges. 1:46Now, what would have happened if I'd taken a beat and first gone 1:50and looked up the answer on a reputable source like NASA? 1:55Well, then I would have been able to say, “Ah, okay! So the answer is Saturn with 146 moons.” 2:03And in fact, this keeps changing because scientists keep on discovering more and more moons. 2:08So I have now grounded my answer in something more believable. 2:11I have not hallucinated or made up an answer. 2:13Oh, by the way, I didn't leak personal information about how long ago it's been since I was obsessed with space. 2:18All right, so what does this have to do with large language models? 2:22Well, how would a large language model have answered this question? 2:26So let's say that I have a user asking this question about moons. 2:31A large language model would confidently say, 2:37OK, I have been trained and from what I know in my parameters during my training, the answer is Jupiter. 2:46The answer is wrong. But, you know, we don't know. 2:50The large language model is very confident in what it answered. 2:52Now, what happens when you add this retrieval augmented part here? 2:57What does that mean? 2:59That means that now, instead of just relying on what the LLM knows, 3:02we are adding a content store. 3:05This could be open like the internet. 3:07This can be closed like some collection of documents, collection of policies, whatever. 3:14The point, though, now is that the LLM first goes and talks 3:17to the content store and says, “Hey, can you retrieve for me 3:22information that is relevant to what the user's query was?” 3:25And now, with this retrieval-augmented answer, it's not Jupiter anymore. 3:31We know that it is Saturn. What does this look like? 3:35Well, first user prompts the LLM with their question. 3:46They say, this is what my question was. 3:48And originally, if we're just talking to a generative model, 3:52the generative model says, “Oh, okay, I know the response. Here it is. Here's my response.” 3:57But now in the RAG framework, 4:00the generative model actually has an instruction that says, "No, no, no." 4:04"First, go and retrieve relevant content." 4:08"Combine that with the user's question and only then generate the answer." 4:13So the prompt now has three parts: 4:17the instruction to pay attention to, the retrieved content, together with the user's question. 4:23Now give a response. And in fact, now you can give evidence for why your response was what it was. 4:30So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before? 4:35So first of all, I'll start with the out of date part. 4:38Now, instead of having to retrain your model, if new information comes up, like, 4:43hey, we found some more moons-- now to Jupiter again, maybe it'll be Saturn again in the future. 4:48All you have to do is you augment your data store with new information, update information. 4:53So now the next time that a user comes and asks the question, we're ready. 4:57We just go ahead and retrieve the most up to date information. 5:00The second problem, source. 5:02Well, the large language model is now being instructed to pay attention 5:07to primary source data before giving its response. 5:10And in fact, now being able to give evidence. 5:13This makes it less likely to hallucinate or to leak data 5:17because it is less likely to rely only on information that it learned during training. 5:21It also allows us to get the model to have a behavior that can be very positive, 5:26which is knowing when to say, “I don't know.” 5:29If the user's question cannot be reliably answered based on your data store, 5:35the model should say, "I don't know," instead of making up something that is believable and may mislead the user. 5:41This can have a negative effect as well though, because if the retriever is not sufficiently good 5:47to give the large language model the best, most high-quality grounding information, 5:53then maybe the user's query that is answerable doesn't get an answer. 5:57So this is actually why lots of folks, including many of us here at IBM, 6:01are working the problem on both sides. 6:03We are both working to improve the retriever 6:06to give the large language model the best quality data on which to ground its response, 6:12and also the generative part so that the LLM can give the richest, best response finally to the user 6:19when it generates the answer. 6:21Thank you for learning more about RAG and like and subscribe to the channel. 6:25Thank you.