Learning Library

← Back to Library

Retrieval-Augmented Fine Tuning Explained

Key Points

  • Retrieval‑augmented fine‑tuning (RAF) merges the strengths of traditional retrieval‑augmented generation (RAG) and fine‑tuning to better handle domain‑specific data.
  • Developed by UC Berkeley researchers, RAF fine‑tunes a model to learn how to locate and use external documents during inference, improving RAG performance in specialized settings.
  • The method is likened to an open‑book exam where the student has also studied the material: unlike pure fine‑tuning (closed‑book) or pure RAG (untrained open‑book), RAF equips the model with both memorized knowledge and effective retrieval skills.
  • RAF’s training data consist of triples—query, a set of relevant documents, and the correct answer—so the model learns to “fish” for information and generate accurate responses.
  • By teaching the model how to retrieve and synthesize external content, RAF provides a durable, scalable solution for enterprise‑level LLM applications.

Full Transcript

# Retrieval-Augmented Fine Tuning Explained **Source:** [https://www.youtube.com/watch?v=rqyczEvh3D4](https://www.youtube.com/watch?v=rqyczEvh3D4) **Duration:** 00:06:53 ## Summary - Retrieval‑augmented fine‑tuning (RAF) merges the strengths of traditional retrieval‑augmented generation (RAG) and fine‑tuning to better handle domain‑specific data. - Developed by UC Berkeley researchers, RAF fine‑tunes a model to learn how to locate and use external documents during inference, improving RAG performance in specialized settings. - The method is likened to an open‑book exam where the student has also studied the material: unlike pure fine‑tuning (closed‑book) or pure RAG (untrained open‑book), RAF equips the model with both memorized knowledge and effective retrieval skills. - RAF’s training data consist of triples—query, a set of relevant documents, and the correct answer—so the model learns to “fish” for information and generate accurate responses. - By teaching the model how to retrieve and synthesize external content, RAF provides a durable, scalable solution for enterprise‑level LLM applications. ## Sections - [00:00:00](https://www.youtube.com/watch?v=rqyczEvh3D4&t=0s) **Hybrid Retrieval‑Augmented Fine‑Tuning** - The passage explains how RAF merges inference‑time document retrieval with training‑time knowledge embedding to boost LLM performance on specialized tasks, using a closed‑book exam analogy. - [00:03:11](https://www.youtube.com/watch?v=rqyczEvh3D4&t=191s) **Teaching Models to Fish** - The speaker outlines the RAFT training method, which pairs queries with mixed sets of relevant (core) and irrelevant (tangent) documents to train a model to retrieve, filter out off‑topic information, and generate answers using chain‑of‑thought reasoning. - [00:06:19](https://www.youtube.com/watch?v=rqyczEvh3D4&t=379s) **Chain-of-Thought Guidance Enhances Model Transparency** - The speaker explains that using chain‑of‑thought reasoning with explicit document citations improves a model’s scalability, robustness, and traceability for enterprise applications. ## Full Transcript
0:00When building GNI applications, retrieval augmented generation is often contrasted with fine tuning as two separate techniques for incorporating domain-specific data into LLM output. 0:14Retrieval augmented fine tuning is a hybrid approach that combines the best of both worlds and addresses many of the challenges surrounding LLN performance in specialized settings. 0:26Originally developed by researchers at UC Berkeley. 0:30RAF uses a unique fine-tuning technique to improve RAG performance in specific domain contexts. 0:40Now with traditional rag, we provide context to the model during inference. 0:49By using a retriever to search for relevant documents in a vector database that we append to our prompt that we send to our LLM. 1:02With fine tuning, we provide context to the model during training time by using a large label data set to bake specific knowledge into a pre-trained LLM. 1:19So how can we combine both of these techniques to create retrieval-augmented fine tuning? 1:29Let's use an analogy. 1:30Let's say that using an LLM on enterprise-specific tasks is like studying for an exam. 1:35Suppose that fine-tuning is like studying for a closed book exam. 1:44Since you can't use your notes, you have to memorize all the materials in advance. 1:49And if you study all the wrong stuff, you probably won't do so well since you don't have access to new information. 1:56In the same way, with fine-tuning, the model has to rely completely on the knowledge it learned during training in order to answer the user's question. 2:05Now, RAG would be like taking an open book exam. 2:11That you did not study for. 2:16Because you knew you could use the book on exam day, you chose to skip all the lectures and not read the textbook. 2:22So on test day, even though you have all the materials in front of you, there's still no guarantee that you'll actually be able to know where to find all the information. 2:32In the same way with RAG, the performance of the model is largely dictated by how well the retriever can pull relevant documents from the database. 2:43Now, with Raft, this is like... 2:46Taking an open book exam that you did study for. 2:54This is the win-win situation, where you paid attention in all the lectures, read all the materials, and get to use the book on the test. 3:03So RAF is similar in that it teaches the model how to use RAG, or how to external documents to generate an answer. 3:11It's like the saying that goes, give a man a fish, and you feed him for a day. 3:16But teach a man to fish, and you feed him, for a lifetime. 3:19In the same way, RAF- essentially teaches the model how to fish or how to look for and generate an answer versus just giving it fish or giving it an answer. 3:31To explain this more, let's dive into the implementation. 3:34Since Raft is a training technique, we need training data. 3:38Each data point will consist of three things, a query, a set of documents, and an answer Let's look at an example. 3:48Let's say our query is how much parental leave does IBM offer? 3:55To generate an answer, we can search through two types of documents, core documents and tangent documents. 4:07Core documents contain information that's relevant to the user query. 4:12In our example, these could be documents on, say, pay leave or benefit eligibility. 4:17Tension documents, on the other hand, contain information, that's irrelevant or off-topic to the use of your query. 4:24These could be document on, retirement accounts or internal code documentation. 4:30From here, we create two types of document sets. 4:33Set one. 4:36Contains both core and tangent documents, and set to contains just tangent documents. 4:44The reason why we include both is to simulate a real RAG use case where the retriever may or may not pull any relevant documents from the database. 4:53Finally, to generate our answer, we use chain of thought reasoning. 4:59To teach the model how to filter past tangent documents and focus on and process through core ones step by step in order to generate a correct. 5:15We can use this framework to create a larger training data set that we can use to train the model using supervised fine tuning. 5:28Now, because this framework is so adaptable, we can a wide variety of different models and fine tuning techniques to actually implement this in practice. 5:37And with that, our model is now ready to ace the exam. 5:40So there are three aspects of this training process that I want to highlight that are key to making this whole thing work. 5:46One, the inclusion of tangent documents helps to teach the model how to pick out relevant documents from irrelevant ones, thus helping to increase accuracy on domain-specific questions. 5:59Secondly, the creation of document sets that don't include any relevant documents at all, AKA set two. 6:07Help to teach the model when to rely on its intrinsic knowledge or to say, I don't know, versus forcing an incorrect answer out of irrelevant rag documents. 6:17This helps to minimize hallucinations. 6:22Guiding the model using chain of thought reasoning helps to minimize overfitting and increase transparency and traceability by encouraging the model to quote specific documents from which it got the answer from. 6:36So as you can see, Raft creates a model that's both highly scalable and highly robust for enterprise tasks. 6:44So whether you found this video because you're studying for that closed book exam or you're just curious about AI, I hope you learned something and enjoyed the video. 6:52Thanks for watching.