Learning Library

← Back to Library

Agentic Retrieval Augmented Generation with LangChain

Key Points

  • The tutorial introduces **agentic Retrieval Augmented Generation (RAG)**, using IBM’s Granite 3.08b‑Instruct model as the reasoning engine, but any LLM can be swapped in.
  • After installing required packages and loading API credentials from a .env file, a **prompt template** is created to let the LLM receive multiple questions and generate responses.
  • A basic query (“What sport is played at the US Open?”) is answered correctly, while a newer query (“Where was the 2024 U.S. Open?”) fails because the model’s training data predates the event, highlighting the need for external knowledge sources.
  • A **knowledge base** is built by listing relevant URLs, loading their content with LangChain’s web loader, chunk‑splitting the text, embedding it via IBM’s Slate model through watsonx.ai, and storing the vectors in an open‑source Chroma DB vector store with a retriever tool.
  • A **structured chat prompt** combines system, human, and tool sections, instructing the agent to display its thought process, the tools invoked (e.g., the RAG retriever), and the final answer, enabling multi‑tool reasoning for complex queries.

Full Transcript

# Agentic Retrieval Augmented Generation with LangChain **Source:** [https://www.youtube.com/watch?v=Y1PaM3edYoI](https://www.youtube.com/watch?v=Y1PaM3edYoI) **Duration:** 00:06:42 ## Summary - The tutorial introduces **agentic Retrieval Augmented Generation (RAG)**, using IBM’s Granite 3.08b‑Instruct model as the reasoning engine, but any LLM can be swapped in. - After installing required packages and loading API credentials from a .env file, a **prompt template** is created to let the LLM receive multiple questions and generate responses. - A basic query (“What sport is played at the US Open?”) is answered correctly, while a newer query (“Where was the 2024 U.S. Open?”) fails because the model’s training data predates the event, highlighting the need for external knowledge sources. - A **knowledge base** is built by listing relevant URLs, loading their content with LangChain’s web loader, chunk‑splitting the text, embedding it via IBM’s Slate model through watsonx.ai, and storing the vectors in an open‑source Chroma DB vector store with a retriever tool. - A **structured chat prompt** combines system, human, and tool sections, instructing the agent to display its thought process, the tools invoked (e.g., the RAG retriever), and the final answer, enabling multi‑tool reasoning for complex queries. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=0s) **Agentic Retrieval Augmented Generation Tutorial** - A walkthrough of setting up an agentic RAG pipeline—installing packages, configuring credentials, using IBM's Granite model as a reasoning engine, testing its limits, and building a web‑source knowledge base with LangChain to enable up‑to‑date answers. - [00:03:06](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=186s) **Building a RAG‑Enabled Agent Prompt** - A step‑by‑step walkthrough of creating a structured chat prompt (system, human, and tool sections), adding LangChain conversation memory, configuring the agent executor, and demonstrating how the agent uses a RAG tool to answer queries. - [00:06:16](https://www.youtube.com/watch?v=Y1PaM3edYoI&t=376s) **Agent Memory Updates & Tool Discretion** - The agent successfully used a rogue tool to fetch and store information, but intelligently avoided unnecessary tool calls when it already possessed the needed knowledge, such as knowing France's capital. ## Full Transcript
0:00Hi, I'm Anna and this is how to perform agentic retrieval augmented generation. 0:05In other words, agentic RAG. 0:07We'll need a few packages for this tutorial. 0:09Make sure to install the following libraries. 0:13Next, we'll import the following packages. 0:18Then we input our API key and project ID. 0:21We've set up our credentials using a .env file. 0:26For this tutorial we're using IBM's granite 3.08b instruct model, but you're free to use any AI model of your choice. 0:35The purpose of these models is to serve as the reasoning engine that decides 0:39which actions to take. 0:43We'll set up a prompt template 0:45to ask multiple questions, 0:48and now we can set up a chain with our prompt and our alarm. 0:53This allows the LLM to produce a response. 0:57Let's test our agent response to a basic question 1:01like what sport is played at the US open? 1:06Our agent responded correctly. 1:08Let's try something a little harder. 1:10Like where was the 2024 U.S. Open? 1:16The LLM is unable to answer this question. 1:19The training data used for this model is from before the 2024 U.S. 1:24Open happened, and without the appropriate tools, 1:27the agent doesn't have access to this information. 1:31The first step in creating our knowledge base is listing the URLs. 1:35We will be getting content from. 1:37Here are a list of URLs summarizing IBM's involvement in the 2024 U.S. Open. Let's put them in a list called URLs. 1:46Next, we’ll load the URLs as documents using LangChain's web based loader. 1:51We'll also print a sample document to see how it loaded. 1:57Looks good. 1:58In order to split the data in these documents into chunks that can 2:01then be processed by the LLM, we can use a text splitter. 2:07The embedding model that we are using is 2:09an IBM slate model through the watsonx.ai embedding service. 2:14Let's initialize it. 2:19In order to store or embedded documents. 2:21We will use Chroma DB, an open source vector store 2:26to access information in the vector store. 2:29We must set up a retriever. 2:33Let's define the get IBM US Open context function and tool. 2:38Our agent will be using the tools description helps the agent know when to call it. 2:44This tool can be used for routing questions to our vector store. 2:47If they're related to IBM's involvement in the 2024 US Open. 2:52Next, let's set up a new prompt template to ask multiple questions. 2:58This template is more complex. 2:59It's known as a structured chat prompt and can be used 3:03for creating agents to have multiple tools available. 3:06Our structured chat prompt will be made up of a system prompt, 3:10a human prompt, and our RAG tool. 3:13First, we'll set up the system prompt. 3:15This prompt tells the agent to print its thought process, 3:19the tools that were used, and the final output. 3:23In the following code, we're establishing the human prompt. 3:26This prompt tells the agent to display the user input, followed by 3:30the intermediate steps taken by the agent as part of the agent scratchpad. 3:36Next, we'll establish the order of our newly 3:39defined prompts in the prompt template. 3:43Now let's finalize our prompt template by adding the tool names, descriptions, 3:48and arguments using a partial prompt template. 3:52An important feature of AI agents is their memory. 3:56Agents are able to store past conversations 3:59and past findings in their memory to improve the accuracy 4:03and relevance of their responses going forward. 4:06In our case, we'll be using LangChain conversation buffer memory 4:10as a means of memory storage. 4:14And now we can set up a chain with our agents 4:16scratchpad memory prompt and the LLM. 4:20The agent executor class is used to execute the agent. 4:25We're now able to ask the agent questions. 4:28Remember how the agent was previously unable 4:31to provide us with the information related to our queries? 4:35Now that the agent has its RAG tool available to use, 4:38let's try asking the same questions again. 4:41Let's start with where was the 2020 for US Open? 4:51Great. 4:52The agent used its available RAG tool to return the location of the 2024 U.S. Open. 4:58We even get to see the exact document that the agent is retrieving its information from. 5:04Now, let's try something harder. 5:06This time, our query will be about IBM's involvement in the 2024 US Open. 5:18Again, the agent was able to successfully retrieve 5:21the relevant information related to our question. 5:24Additionally, the agent is successfully updating its knowledge base 5:28as it learns new information and experiences new interactions 5:32as seen by the history output. 5:35Now, let's test if the agent can determine when to calling 5:38isn't necessary to answer the user query. 5:42We can test this by asking the wrong 5:44agent a question that is not about the US Open. 5:47Like what is the capital of France? 5:52As seen in the agent executor chain, 5:54the agent recognized that it had the information in its knowledge base. 5:59To answer this question without using any of its tools. 6:03And that's it. 6:04In this tutorial, we created a RAG agent using LangChain in Python with watsonx.ai. 6:10The LLM we worked with was an IBM granite 3.08 the Instruct model. 6:16The AI agent was successfully able to retrieve relevant 6:19information via a rogue tool, update each memory 6:22with each interaction and output responses. 6:25It's also important to note the agent's ability to discern 6:28whether tool calling is appropriate for each specific task. 6:32For instance, when the agent already had the relevant 6:35information needed to answer a question about the capital of France, 6:39it didn't use any tool calling for question answering.