Learning Library

← Back to Library

Agentic Retrieval-Augmented Generation Pipeline

Key Points

  • Retrieval‑augmented generation (RAG) improves LLM answers by pulling relevant documents from a vector database and feeding them as context to the model.
  • Traditional RAG pipelines query a single database and call the LLM only once to generate a response.
  • An “agenetic” RAG approach treats the LLM as an autonomous agent that can decide which of multiple vector stores to query and even choose the response format (e.g., text, chart, code) based on the query.
  • By adding an agent, the system can route specific questions—such as internal policy inquiries—to a dedicated internal documentation store, while broader industry‑knowledge queries are sent to a public knowledge base, enhancing relevance and accuracy.

Full Transcript

# Agentic Retrieval-Augmented Generation Pipeline **Source:** [https://www.youtube.com/watch?v=0z9_MhcYvcY](https://www.youtube.com/watch?v=0z9_MhcYvcY) **Duration:** 00:05:29 ## Summary - Retrieval‑augmented generation (RAG) improves LLM answers by pulling relevant documents from a vector database and feeding them as context to the model. - Traditional RAG pipelines query a single database and call the LLM only once to generate a response. - An “agenetic” RAG approach treats the LLM as an autonomous agent that can decide which of multiple vector stores to query and even choose the response format (e.g., text, chart, code) based on the query. - By adding an agent, the system can route specific questions—such as internal policy inquiries—to a dedicated internal documentation store, while broader industry‑knowledge queries are sent to a public knowledge base, enhancing relevance and accuracy. ## Sections - [00:00:00](https://www.youtube.com/watch?v=0z9_MhcYvcY&t=0s) **Understanding Retrieval Augmented Generation** - The speaker reviews the RAG pipeline, explaining how a vector database supplies context to a LLM and proposes extending the LLM’s role to meta‑tasks such as selecting which database to query and determining the appropriate response format. - [00:03:12](https://www.youtube.com/watch?v=0z9_MhcYvcY&t=192s) **Context‑Aware Routing in Agentic RAG** - The speaker explains how an LLM‑driven agent parses a user’s query, determines its context, and routes it to either internal documentation, a general knowledge database, or a failsafe response when the question falls outside the available vector stores, enabling flexible applications in support and legal tech. ## Full Transcript
0:00So we all know what retrieval augmented generation is. 0:04But let's just do a quick refresher. 0:06Retrieval augmented generation is a powerful and popular 0:10pipeline that enhances responses from a large language model. 0:14It does this by incorporating relevant data retrieved from a vector database, 0:18adding it as context to the prompt, and sending it to the LLM for generation. 0:22What this does is it allows the LLM to ground its response in concrete and accurate information, 0:28and that improves the quality and reliability of the response. 0:31Let me quickly sketch it out. 0:34So let's say we have a user 0:37or an application, even. 0:41And they send a query. 0:44Now without retrieval augment the generation. 0:47This query is going to go and get itself interpolated into a prompt. 0:55And from there 0:57that's going to hit the LLM. 1:01And that's going to generate an output. 1:07To make this rag. 1:09We can add a vector database. 1:12So instead of just going directly and getting itself 1:14interpolated into the prompt, it's going to hit this vector db. 1:17And the response from that vector db is going to be used as context 1:21for the prompt. 1:23Now in this typical pipeline we call the LLM only once, 1:27and we use it solely to generate a response. 1:30But what if we could leverage the LLM not just for responses, 1:34but also for additional tasks like deciding which vector database to query 1:39If we have multiple databases, or even determining the type of response to give? 1:43Should an answer with text generate a chart or even provide a code snippet? 1:48And that would all be dependent on the context of that query. 1:52So this is where the agenetic RAG pipeline 1:57comes into play. 1:58In agenetic RAG, we use the LLM as an agent and the LLM goes beyond just generating a response. 2:05It takes on an active role and can make decisions that will improve 2:09both the relevance and accuracy of the retrieved data. 2:13Now, let's explore how we can augment the initial process 2:16with an agent and a couple of different sources of data. 2:20So instead of just one single source, 2:22let's add a second. 2:25And the first one can be, you know, 2:28internal documentation, Right? 2:31And the second one can be general industry knowledge. 2:39Now in the internal documentation we're going to have things 2:42like policies procedures and guidelines. 2:44And the general knowledge base 2:45will have things like industry standards, best practices and public resources. 2:51So how can we get the LLM to use the vector database 2:54that contains the data that would be most relevant to the query? 2:58Let's add that agent into this pipeline. 3:05Now, this agent can intelligently decide which database 3:08to query based on the user's question, and the agent isn't making a random guess. 3:12It's leveraging the LLMs language, understanding capabilities 3:17to interpret the query and determine its context. 3:21So if an employee asks what's the company's policy on remote work 3:24during the holidays, it would route that to the internal documentation, 3:28and that response will be used as context for the prompt. 3:31But if the question is more general, like what are the industries standards 3:35for remote work in tech companies, 3:38the agent's going to route that to the general knowledge database, 3:40and that context is going to be used within that prompt powered by an LLM 3:45and properly trained, the agent analyzes the query and based on the understanding 3:50of the content and the context, decides which database to use. 3:54But they're not always going to ask questions that are generally 3:57or genuinely relevant to any of this, any of the stuff that we have 4:00in our vector DB. 4:01So what if someone asks a question that is just totally out of left field? 4:05Like who won the World Series in 2015? 4:08What the agent can do at that point is it could route it to a failsafe. 4:15So because the agent is able 4:16to recognize the context of the query, 4:21it could recognize that it's not a part of the two databases that we have, 4:25could route it to the failsafe and return back. 4:29Sorry, I don't have the information in looking for. 4:32This agentic RAG pipeline can be used in customer support systems and legal tech. 4:37For example, a lawyer can source 4:39answers to their questions from like their internal briefs 4:42and then in another query, just get stuff from public caseload databases. 4:46The agent can be utilized in a ton of ways. 4:49Agentic RAG is an evolution in how we enhance the RAG pipeline by moving 4:53beyond simple response generation to more intelligent decision making. 4:58By allowing an agent to choose the best data sources 5:01and potentially even incorporate external information 5:04like real timedata or third party services. 5:07We can create a pipeline that's more responsive, more accurate, and more adaptable. 5:13This approach opens up so many possibilities for applications 5:16in customer service, legal, tech, health care, 5:19virtually any field as IT technology continues to evolve. 5:23We will see AI systems that truly understand context 5:26and can deliver amazing values to the end user.