Learning Library

← Back to Library

LangChain Retrieval-Augmented Generation Demo

7m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on.
She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021.
The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context.
The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook.
Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer.

Sections

Full Transcript

# LangChain Retrieval-Augmented Generation Demo **Source:** [https://www.youtube.com/watch?v=cDn7bf84LsM](https://www.youtube.com/watch?v=cDn7bf84LsM) **Duration:** 00:07:59 ## Summary - Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on. - She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021. - The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context. - The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook. - Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cDn7bf84LsM&t=0s) **LangChain RAG Tutorial Overview** - Erica demonstrates a Python LangChain workflow that adds a knowledge base, retriever, and prompt to enable retrieval‑augmented generation for up‑to‑date answers, using an IBM‑UFC announcement as the example. - [00:03:06](https://www.youtube.com/watch?v=cDn7bf84LsM&t=186s) **Preparing Documents for Vector Store** - The speaker demonstrates mapping URLs, loading and cleaning articles with LangChain, chunking the text, embedding it using IBM's Slate model, and saving the vectors in a local Chroma database. - [00:06:12](https://www.youtube.com/watch?v=cDn7bf84LsM&t=372s) **Setting Up RAG for IBM Knowledge Base** - The speaker explains configuring a prompt template, helper function, and retrieval‑augmented generation chain to answer queries about the IBM‑UFC partnership and IBM’s watsonx.data and watsonx.ai services. ## Full Transcript

0:00Hi, my name is Erica and I'm going to show you how to use LangChain for a simple RAG example In Python. 0:07Large language models, LLMs, can be great for answering lots of questions, 0:11but sometimes the models don't have the most up to date information 0:15and can answer some questions about recent events. 0:19For example, I was reading this recent announcement about the UFC and IBM partnership 0:24on IBM.com and wanted to ask a LLM about it. 0:29But when I asked the IBM granite model to tell me about the UFC announcement from November 14th, 2024, 0:36it didn't know what I was talking about and mentioned it was trained on a limited data set up to only 2021. 0:42How do I give this LLM the most up to date information so it can answer my question. 0:48The answer is RAG, retrieval augmented generation. 0:52Let me show you how it works. 0:54Typically we have our user asking the question to the LLM, which generates a response. 1:00But as you just saw, the LLM didn't have the right information, the context, to answer my question. 1:07So we need to add something in the middle between the question and the LLM. 1:13First, we'll add a knowledge base to include the content we want the LLM to read. 1:17In this case, it'll be the most up to date content from IBM.com pages about some IBM products and announcements. 1:26Second, we'll set up a retriever to fetch the content from the knowledge base. 1:32Third, we'll set up the LLM to be fed the content. 1:37Fourth will establish a prompt with instructions to be able to ask the LLM questions. 1:43The top search results from search and retrieval will also be gathered here. 1:49Once we've completed these four steps, we can start asking our questions about the content in our knowledge base. 1:55Our query is search for in our knowledge Base Vector store. 1:58The top results are returned as contacts for the LLM. 2:02And finally, the LLM generates a response. 2:05I'll walk through all these steps again in the Jupyter Notebook, linked in the description to this video. 2:11Before we can begin, we need to fetch an API key and project ID for our notebook. 2:16You can get these credentials by following the steps in the video linked in the description below. 2:22We also have a few libraries to use for this tutorial. 2:26If you don't have these packages installed yet, you can solve this with a quick pip install, 2:34and here we can import the packages. 2:38Next, 2:38save your watsonx ID an watsonx API key in a separate .EMV file. 2:45Make sure it's in the same directory as this notebook. 2:48I have my credentials saved already, so I'll import those over 2:51from my .EMV file and save them in a dictionary called credentials. 2:56Okay, now we can get started with the workflow. 3:00First will gather the information from some IBM.com URLs to create a knowledge base as a vector store. 3:11Let's establish URL's dictionary. 3:14It's a Python dictionary that helps us map the 25 URLs from which we will be getting the content. 3:20You can see at the top here, I have the article about the UFC and IBM partnership I asked about before. 3:27Let's also set up a name for our collection, 3:30Ask IBM 2024. 3:33Next, let's load our documents using the LangChain web based loader for the list of URLs we have. 3:40Loaders load in data from a source and return a list of documents. 3:45We'll print the page content of a sample document at the end to see how it's been loaded. 3:51It can take a little while for it to finish loading, 3:54and here's a sample document based on the sample document, 3:57it looks like there's a lot of whitespace and newline characters that we can get rid of. 4:03Let's clean that up with this code. 4:08Let's see how our sample document looks now after we've cleaned it up. 4:14Great. We've removed the whitespace successfully. 4:18Before we vectorize our content. 4:20We need to split it up into smaller, more manageable pieces known as chunks. LangChain's recursive character text splitter, 4:28takes a large text and splits it based on a specified chunk size, meaning the number of characters. 4:34In our case, we're going to go with a chunk size of 512. 4:38Next, we need to instantiate an embedding model to vectorize our content. 4:42In our case, we'll use IBM's Slate model, 4:45and to finish off this step. 4:47Let's load our content into a local instance of the vector database using Chroma. 4:52We'll call it vector store. 4:53The documents in the vector store will be made up of the docs we just chunked and they'll be embedded using the IBM Slate model. 5:01For step two, we'll set up our vector store as a retriever. 5:05The retrieved information from the vector Store, the content from the URLs 5:10serves as additional context that the LLM will use to generate a response later in step four. 5:18Code wise, all we need to do is set up our vector store as retriever. 5:23For step three, we'll set up our generative LLM. 5:26The generative model will use the retrieved information from step two to produce a relevant response to our questions. 5:33First will establish which LLM we're going to use to generate the response. 5:38For this tutorial, we'll use an IBM Granite model. 5:44Next we'll set up the model parameters. 5:46The model parameters available, and what they mean can be found in the description of this video. 5:52And finally, in this step, we instantiate the LLM using watsonx. 5:56In step four we'll set up our prompt which will combine our instructions, 6:01the search results from step two, and our question to provide context to the LLM we just instantiated in step three. 6:09First, let's set up instructions for the LLM. 6:13We'll call it template because we'll also set up our prompt using a prompt template, and our instructions. 6:21Let's also set up a helper function to format our docs to differentiate between individual page content. 6:27Finally, as part of this step, we can set up a RAG chain with our search results for my retriever. 6:33Our prompt, our helper function and our LLM. 6:37Finally and step five and six, we can ask the other questions about our knowledge base. 6:43The generative model will process the augmented context along with the user's question to produce a response. 6:50First, let's ask our initial question. 6:52Tell me about the UFC announcement from November 14th, 2024. 6:58On November 14th, 2024, IBM and UFC announced a groundbreaking partnership, 7:03and it looks like the model was able to answer a question this time. 7:07Since it received the contacts from the UFC article, we fed it. 7:12Next, let's ask about watsonx.data 7:16What is watsonx.data? 7:19watsonx.data is a service offered by IBM that enables users 7:23to connect to various data sources and manage metadata for creating data products. 7:28Looks good. 7:30And finally, let's ask about watsonx.ai 7:34What does watsonx.ai do? 7:38watsonx.ai is a comprehensive AI platform that enables users to build, deploy and manage AI applications. 7:46It was also able to respond to our watsonx.ai question. 7:50Feel free to experiment with even more questions about the IBM offerings 7:54and technologies discussed in the 25 articles you loaded into the knowledge base.