LangChain Retrieval-Augmented Generation Demo
Key Points
- Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on.
- She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021.
- The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context.
- The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook.
- Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer.
Sections
- LangChain RAG Tutorial Overview - Erica demonstrates a Python LangChain workflow that adds a knowledge base, retriever, and prompt to enable retrieval‑augmented generation for up‑to‑date answers, using an IBM‑UFC announcement as the example.
- Preparing Documents for Vector Store - The speaker demonstrates mapping URLs, loading and cleaning articles with LangChain, chunking the text, embedding it using IBM's Slate model, and saving the vectors in a local Chroma database.
- Setting Up RAG for IBM Knowledge Base - The speaker explains configuring a prompt template, helper function, and retrieval‑augmented generation chain to answer queries about the IBM‑UFC partnership and IBM’s watsonx.data and watsonx.ai services.
Full Transcript
# LangChain Retrieval-Augmented Generation Demo **Source:** [https://www.youtube.com/watch?v=cDn7bf84LsM](https://www.youtube.com/watch?v=cDn7bf84LsM) **Duration:** 00:07:59 ## Summary - Erica introduces a Retrieval Augmented Generation (RAG) workflow using LangChain to give large language models up‑to‑date information that they weren’t trained on. - She demonstrates the problem with a recent IBM‑UFC partnership announcement that an IBM Granite model couldn’t answer because its training data only goes up to 2021. - The RAG solution involves (1) creating a knowledge base from current IBM.com pages, (2) using a retriever to fetch relevant documents, (3) feeding those documents to the LLM, and (4) prompting the LLM with the retrieved context. - The tutorial shows how to set up the required Watsonx credentials, install necessary Python packages, and store the API key in a `.env` file for the notebook. - Finally, she builds a vector store from a dictionary of 25 IBM URLs—including the UFC article—and uses it to retrieve top results that the LLM can incorporate into its answer. ## Sections - [00:00:00](https://www.youtube.com/watch?v=cDn7bf84LsM&t=0s) **LangChain RAG Tutorial Overview** - Erica demonstrates a Python LangChain workflow that adds a knowledge base, retriever, and prompt to enable retrieval‑augmented generation for up‑to‑date answers, using an IBM‑UFC announcement as the example. - [00:03:06](https://www.youtube.com/watch?v=cDn7bf84LsM&t=186s) **Preparing Documents for Vector Store** - The speaker demonstrates mapping URLs, loading and cleaning articles with LangChain, chunking the text, embedding it using IBM's Slate model, and saving the vectors in a local Chroma database. - [00:06:12](https://www.youtube.com/watch?v=cDn7bf84LsM&t=372s) **Setting Up RAG for IBM Knowledge Base** - The speaker explains configuring a prompt template, helper function, and retrieval‑augmented generation chain to answer queries about the IBM‑UFC partnership and IBM’s watsonx.data and watsonx.ai services. ## Full Transcript
Hi, my name is Erica and I'm going to show you how to use LangChain for a simple RAG example In Python.
Large language models, LLMs, can be great for answering lots of questions,
but sometimes the models don't have the most up to date information
and can answer some questions about recent events.
For example, I was reading this recent announcement about the UFC and IBM partnership
on IBM.com and wanted to ask a LLM about it.
But when I asked the IBM granite model to tell me about the UFC announcement from November 14th, 2024,
it didn't know what I was talking about and mentioned it was trained on a limited data set up to only 2021.
How do I give this LLM the most up to date information so it can answer my question.
The answer is RAG, retrieval augmented generation.
Let me show you how it works.
Typically we have our user asking the question to the LLM, which generates a response.
But as you just saw, the LLM didn't have the right information, the context, to answer my question.
So we need to add something in the middle between the question and the LLM.
First, we'll add a knowledge base to include the content we want the LLM to read.
In this case, it'll be the most up to date content from IBM.com pages about some IBM products and announcements.
Second, we'll set up a retriever to fetch the content from the knowledge base.
Third, we'll set up the LLM to be fed the content.
Fourth will establish a prompt with instructions to be able to ask the LLM questions.
The top search results from search and retrieval will also be gathered here.
Once we've completed these four steps, we can start asking our questions about the content in our knowledge base.
Our query is search for in our knowledge Base Vector store.
The top results are returned as contacts for the LLM.
And finally, the LLM generates a response.
I'll walk through all these steps again in the Jupyter Notebook, linked in the description to this video.
Before we can begin, we need to fetch an API key and project ID for our notebook.
You can get these credentials by following the steps in the video linked in the description below.
We also have a few libraries to use for this tutorial.
If you don't have these packages installed yet, you can solve this with a quick pip install,
and here we can import the packages.
Next,
save your watsonx ID an watsonx API key in a separate .EMV file.
Make sure it's in the same directory as this notebook.
I have my credentials saved already, so I'll import those over
from my .EMV file and save them in a dictionary called credentials.
Okay, now we can get started with the workflow.
First will gather the information from some IBM.com URLs to create a knowledge base as a vector store.
Let's establish URL's dictionary.
It's a Python dictionary that helps us map the 25 URLs from which we will be getting the content.
You can see at the top here, I have the article about the UFC and IBM partnership I asked about before.
Let's also set up a name for our collection,
Ask IBM 2024.
Next, let's load our documents using the LangChain web based loader for the list of URLs we have.
Loaders load in data from a source and return a list of documents.
We'll print the page content of a sample document at the end to see how it's been loaded.
It can take a little while for it to finish loading,
and here's a sample document based on the sample document,
it looks like there's a lot of whitespace and newline characters that we can get rid of.
Let's clean that up with this code.
Let's see how our sample document looks now after we've cleaned it up.
Great. We've removed the whitespace successfully.
Before we vectorize our content.
We need to split it up into smaller, more manageable pieces known as chunks. LangChain's recursive character text splitter,
takes a large text and splits it based on a specified chunk size, meaning the number of characters.
In our case, we're going to go with a chunk size of 512.
Next, we need to instantiate an embedding model to vectorize our content.
In our case, we'll use IBM's Slate model,
and to finish off this step.
Let's load our content into a local instance of the vector database using Chroma.
We'll call it vector store.
The documents in the vector store will be made up of the docs we just chunked and they'll be embedded using the IBM Slate model.
For step two, we'll set up our vector store as a retriever.
The retrieved information from the vector Store, the content from the URLs
serves as additional context that the LLM will use to generate a response later in step four.
Code wise, all we need to do is set up our vector store as retriever.
For step three, we'll set up our generative LLM.
The generative model will use the retrieved information from step two to produce a relevant response to our questions.
First will establish which LLM we're going to use to generate the response.
For this tutorial, we'll use an IBM Granite model.
Next we'll set up the model parameters.
The model parameters available, and what they mean can be found in the description of this video.
And finally, in this step, we instantiate the LLM using watsonx.
In step four we'll set up our prompt which will combine our instructions,
the search results from step two, and our question to provide context to the LLM we just instantiated in step three.
First, let's set up instructions for the LLM.
We'll call it template because we'll also set up our prompt using a prompt template, and our instructions.
Let's also set up a helper function to format our docs to differentiate between individual page content.
Finally, as part of this step, we can set up a RAG chain with our search results for my retriever.
Our prompt, our helper function and our LLM.
Finally and step five and six, we can ask the other questions about our knowledge base.
The generative model will process the augmented context along with the user's question to produce a response.
First, let's ask our initial question.
Tell me about the UFC announcement from November 14th, 2024.
On November 14th, 2024, IBM and UFC announced a groundbreaking partnership,
and it looks like the model was able to answer a question this time.
Since it received the contacts from the UFC article, we fed it.
Next, let's ask about watsonx.data
What is watsonx.data?
watsonx.data is a service offered by IBM that enables users
to connect to various data sources and manage metadata for creating data products.
Looks good.
And finally, let's ask about watsonx.ai
What does watsonx.ai do?
watsonx.ai is a comprehensive AI platform that enables users to build, deploy and manage AI applications.
It was also able to respond to our watsonx.ai question.
Feel free to experiment with even more questions about the IBM offerings
and technologies discussed in the 25 articles you loaded into the knowledge base.