Learning Library

← Back to Library

Retrieval-Augmented Generation Enhances LLM Accuracy

Key Points

  • Large language models (LLMs) often give confident answers that can be factually incorrect, outdated, or lack supporting sources.
  • An anecdote about planetary moons illustrates two common LLM issues: no citation for the information and reliance on stale knowledge.
  • Retrieval‑Augmented Generation (RAG) addresses these problems by consulting an external content store—either open (e.g., the internet) or closed (e.g., curated documents)—to retrieve up‑to‑date, verifiable information.
  • By grounding responses in retrieved sources, RAG improves accuracy, reduces hallucinations, and ensures answers reflect the latest available data.

Full Transcript

# Retrieval-Augmented Generation Enhances LLM Accuracy **Source:** [https://www.youtube.com/watch?v=T-D1OfcDW1M](https://www.youtube.com/watch?v=T-D1OfcDW1M) **Duration:** 00:06:26 ## Summary - Large language models (LLMs) often give confident answers that can be factually incorrect, outdated, or lack supporting sources. - An anecdote about planetary moons illustrates two common LLM issues: no citation for the information and reliance on stale knowledge. - Retrieval‑Augmented Generation (RAG) addresses these problems by consulting an external content store—either open (e.g., the internet) or closed (e.g., curated documents)—to retrieve up‑to‑date, verifiable information. - By grounding responses in retrieved sources, RAG improves accuracy, reduces hallucinations, and ensures answers reflect the latest available data. ## Sections - [00:00:00](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=0s) **LLM Generation Lacks Sources** - Marina Danilevsky uses a personal anecdote to show that pure large‑language‑model answers can be confidently incorrect, outdated, and without citations, highlighting the need for retrieval‑augmented generation. - [00:03:05](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=185s) **Retrieval-Augmented Generation Explained** - The speaker outlines how a RAG framework lets an LLM first fetch relevant documents from a content store before answering, enabling up‑to‑date, evidence‑backed responses without retraining. - [00:06:20](https://www.youtube.com/watch?v=T-D1OfcDW1M&t=380s) **Thank You and Subscribe Prompt** - The speaker thanks viewers for learning about RAG and urges them to like the video and subscribe to the channel. ## Full Transcript
0:00Large language models. They are everywhere. 0:02They get some things amazingly right 0:05and other things very interestingly wrong. 0:07My name is Marina Danilevsky. 0:09I am a Senior Research Scientist here at IBM Research. 0:12And I want to tell you about a framework to help large language models 0:16be more accurate and more up to date: 0:18Retrieval-Augmented Generation, or RAG. 0:22Let's just talk about the "Generation" part for a minute. 0:24So forget the "Retrieval-Augmented". 0:26So the generation, this refers to large language models, or LLMs, 0:31that generate text in response to a user query, referred to as a prompt. 0:36These models can have some undesirable behavior. 0:38I want to tell you an anecdote to illustrate this. 0:41So my kids, they recently asked me this question: 0:44"In our solar system, what planet has the most moons?" 0:48And my response was, “Oh, that's really great that you're asking this question. I loved space when I was your age.” 0:55Of course, that was like 30 years ago. 0:58But I know this! I read an article 1:00and the article said that it was Jupiter and 88 moons. So that's the answer. 1:06Now, actually, there's a couple of things wrong with my answer. 1:10First of all, I have no source to support what I'm saying. 1:14So even though I confidently said “I read an article, I know the answer!”, I'm not sourcing it. 1:18I'm giving the answer off the top of my head. 1:20And also, I actually haven't kept up with this for awhile, and my answer is out of date. 1:26So we have two problems here. One is no source. And the second problem is that I am out of date. 1:35And these, in fact, are two behaviors that are often observed as problematic 1:41when interacting with large language models. They’re LLM challenges. 1:46Now, what would have happened if I'd taken a beat and first gone 1:50and looked up the answer on a reputable source like NASA? 1:55Well, then I would have been able to say, “Ah, okay! So the answer is Saturn with 146 moons.” 2:03And in fact, this keeps changing because scientists keep on discovering more and more moons. 2:08So I have now grounded my answer in something more  believable. 2:11I have not hallucinated or made up an answer. 2:13Oh, by the way, I didn't leak personal information about how long ago it's been since I was obsessed with space. 2:18All right, so what does this have to do with large language models? 2:22Well, how would a large language model have answered this question? 2:26So let's say that I have a user asking this question about moons. 2:31A large language model would confidently say, 2:37OK, I have been trained and from what I know in my parameters during my training, the answer is Jupiter. 2:46The answer is wrong. But, you know, we don't know. 2:50The large language model is very confident in what it answered. 2:52Now, what happens when you add this retrieval augmented part here? 2:57What does that mean? 2:59That means that now, instead of just relying on what the LLM knows, 3:02we are adding a content store. 3:05This could be open like the internet. 3:07This can be closed like some collection of documents, collection of policies, whatever. 3:14The point, though, now is that the LLM first goes and talks 3:17to the content store and says, “Hey, can you retrieve for me 3:22information that is relevant to what the user's query was?” 3:25And now, with this retrieval-augmented answer, it's not Jupiter anymore. 3:31We know that it is Saturn. What does this look like? 3:35Well, first user prompts the LLM with their question. 3:46They say, this is what my question was. 3:48And originally, if we're just talking to a generative model, 3:52the generative model says, “Oh, okay, I know the response. Here it is. Here's my response.” 3:57But now in the RAG framework, 4:00the generative model actually has an instruction that says, "No, no, no." 4:04"First, go and retrieve relevant content." 4:08"Combine that with the user's question and only then generate the answer." 4:13So the prompt now has three parts: 4:17the instruction to pay attention to, the retrieved content, together with the user's question. 4:23Now give a response. And in fact, now you can give evidence for why your response was what it was. 4:30So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before? 4:35So first of all, I'll start with the out of date part. 4:38Now, instead of having to retrain your model, if new information comes up, like, 4:43hey, we found some more moons-- now to Jupiter again, maybe it'll be Saturn again in the future. 4:48All you have to do is you augment your data store with new information, update information. 4:53So now the next time that a user comes and asks the question, we're ready. 4:57We just go ahead and retrieve the most up to date information. 5:00The second problem, source. 5:02Well, the large language model is now being instructed to pay attention 5:07to primary source data before giving its response. 5:10And in fact, now being able to give evidence. 5:13This makes it less likely to hallucinate or to leak data 5:17because it is less likely to rely only on information that it learned during training. 5:21It also allows us to get the model to have a behavior that can be very positive, 5:26which is knowing when to say, “I don't know.” 5:29If the user's question cannot be reliably answered based on your data store, 5:35the model should say, "I don't know," instead of making up something that is believable and may mislead the user. 5:41This can have a negative effect as well though, because if the retriever is not sufficiently good 5:47to give the large language model the best, most high-quality grounding information, 5:53then maybe the user's query that is answerable doesn't get an answer. 5:57So this is actually why lots of folks, including many of us here at IBM, 6:01are working the problem on both sides. 6:03We are both working to improve the retriever 6:06to give the large language model the best quality data on which to ground its response, 6:12and also the generative part so that the LLM can give the richest, best response finally to the user 6:19when it generates the answer. 6:21Thank you for learning more about RAG and like and subscribe to the channel. 6:25Thank you.