Learning Library

← Back to Library

Debunking Agentic AI and RAG Myths

Key Points

  • Agentic AI and Retrieval‑Augmented Generation (RAG) have become buzzwords, but popular myths—like “agentic AI is only for coding” and “RAG is always the best way to add fresh data”—are overstated.
  • The suitability of RAG (or any AI approach) is highly context‑dependent; there is no universal “always best” answer.
  • Agentic AI describes multi‑agent workflows that continuously loop through perceiving the environment, consulting memory, reasoning, acting, and observing outcomes, with agents communicating and using tools at the application level.
  • The most prevalent current implementation of agentic AI is in coding assistants/copilots, where distinct agents (architect, implementer, reviewer) collaborate like a miniature development team while a human remains the overall conductor.
  • While coding dominates today’s use cases, the discussion points to broader enterprise scenarios where agentic AI could be applied beyond software development.

Full Transcript

# Debunking Agentic AI and RAG Myths **Source:** [https://www.youtube.com/watch?v=fB2JQXEH_94](https://www.youtube.com/watch?v=fB2JQXEH_94) **Duration:** 00:09:48 ## Summary - Agentic AI and Retrieval‑Augmented Generation (RAG) have become buzzwords, but popular myths—like “agentic AI is only for coding” and “RAG is always the best way to add fresh data”—are overstated. - The suitability of RAG (or any AI approach) is highly context‑dependent; there is no universal “always best” answer. - Agentic AI describes multi‑agent workflows that continuously loop through perceiving the environment, consulting memory, reasoning, acting, and observing outcomes, with agents communicating and using tools at the application level. - The most prevalent current implementation of agentic AI is in coding assistants/copilots, where distinct agents (architect, implementer, reviewer) collaborate like a miniature development team while a human remains the overall conductor. - While coding dominates today’s use cases, the discussion points to broader enterprise scenarios where agentic AI could be applied beyond software development. ## Sections - [00:00:00](https://www.youtube.com/watch?v=fB2JQXEH_94&t=0s) **Debunking Agentic AI & RAG Myths** - The speaker critiques hype around “agentic AI” and Retrieval‑Augmented Generation, explains what each term actually means, and stresses that their suitability depends on context while outlining basic multi‑agent workflow components. - [00:03:12](https://www.youtube.com/watch?v=fB2JQXEH_94&t=192s) **Orchestrating Agentic AI with RAG** - The speaker explains how enterprise AI agents act like conductors—routing queries, invoking tools via a model‑context protocol, and mitigating hallucinations by employing Retrieval‑Augmented Generation’s offline indexing and online retrieval steps. - [00:06:47](https://www.youtube.com/watch?v=fB2JQXEH_94&t=407s) **Strategic Ingestion and Context Engineering** - The speaker outlines the two‑phase RAG workflow—careful data curation and conversion during ingestion, followed by hybrid semantic‑keyword retrieval and relevance re‑ranking to create a concise, prioritized context for the LLM. ## Full Transcript
0:00I think it's fair to say that some of the most used AI buzzwords in recent times have 0:06been, well, one of them is certainly agentic AI, and... let me 0:13guess another one, right? Probably RAG. Yeah. Retrieval augmented generation. And with those 0:20buzzwords has come plenty of hype and preconceived notions. Preconceived notions like 0:27how the primary use case for agentic AI today is coding. Exactly. Or that RAG is always the best 0:34way to incorporate specific, up-to-date information into a model's context window. Wait, so 0:40are we saying that these things are not the case? Oh, Cedric. You know, this is where we wheel out 0:47the consultant's default answer, right? Well, I guess I do. Is RAG always the best 0:53option? Well, here it comes. It depends. It depends. There you go. You know, I spent seven 1:00years as a technical consultant, and no matter what the question, a good old "it depends," that 1:05always seems to work. Well, I have an idea. How about we explain what it depends on. Right. So, 1:11let's start off by explaining what these terms agentic AI and RAG really mean. And then you can 1:18get your practitioner viewpoint out where these buzzy technologies are actually going to be put 1:22into action. Now, AI multi-agent workflows, they perceive their environment, they make 1:29decisions and they execute actions towards achieving a goal. And all of this happens with 1:34minimal human intervention. Now, architecturally, these components, they kind of form a loop. So, the 1:41first thing on the loop might be to perceive. And once they've perceived their 1:47environment, they can consult memory, they can 1:54reason, they can act along a particular path, 2:00and then they can go through the final stage, which is to observe what happened, and kind of 2:06round and round we go in a loop. the key here is that each agent operates at the application 2:13level. They're making decisions, they're using tools and they can communicate with each other. 2:18Now Martin, that's great. But if I had to pick the most common use case for agentic AI, I think it has 2:24to be coding agents, right? Uh, yeah. You mean like, uh, like code assistants and copilots? Precisely. 2:30And these are examples of agents that can help plan and architect new ideas that can 2:37help write code straight to our repository, and even help review the code that we've generated—with 2:43minimal human guidance and by using LLMs that have larger context windows with reasoning 2:49capabilities. This, this kind of looks like a, like a mini developer team, like where you have maybe a, 2:55an architect agent that kind of plans out the feature. And then we've got the 3:01implementer that's going to come along and actually write the code. And then we've got the 3:06reviewer that checks out that code, and then maybe send some feedback in a loop like this. 3:12Exactly. And this agentic pattern still needs human intervention. But our job is to be more of a 3:19conductor of an orchestra, right, than play a single instrument. Now, let's also think about 3:25another use case for agentic AI. Think about enterprises with the need to handle support 3:30tickets or HR requests. Or, for example, customers who have some particular query where specialized 3:37agents can autonomously filter and query this to the right agent that's able 3:44to then use tool calling in order to use services or an API, using some type of 3:51protocol like model context protocol, which standardizes the interaction between our LLMs and 3:57the tools that we use every day. Cool. So instead of using a chat window with an LLM to kick off an 4:04action, agents can be responsive in their own environment. Exactly. But, but there is a 4:10challenge, right? Because without reliable access to external information, these agents, they can 4:16quickly hallucinate, or they can make misinformed decisions. And one way we can limit 4:23those misinformed decisions is with retrieval augmented generation or 4:30RAG. Right. And RAG is essentially a two-phase system because you've got a offline phase where 4:36you ingest and index your knowledge, and an online phase where you retrieve and generate on demand. 4:41And the offline part, it's pretty straightforward. So, we start off with, well, let's start it over 4:47here. We're going to start with some documents. So, these are your documents. That could be Word files, it 4:52could be PDFs, whatever. And we're going to break them into chunks and create vector 4:58embeddings for each chunk using something called an embedding model. Now, 5:04these embeddings, they get stored into a special type of database called 5:11a vector database. So, now you have a searchable index of your knowledge. And when a query 5:18hits the system—so we've got perhaps here a prompt from the user—that's where the 5:24online face kicks in. So, the prompt goes to a RAG retriever, and that takes 5:31the user question and it turns it into vector embeddings using the same embedding model. And 5:38then it performs a similarity search in your vector database. Now, that's going to return back 5:45to you the top K most relevant document chunks, perhaps 3 to 5 passages that are most likely to 5:51contain the answer. And that is what is going to be received by the large language 5:58model at the end of this. Wow, Martin! And this is really powerful. But when we start to scale things 6:05up with more data, right, from our organization, or perhaps allow more users to start using 6:12this RAG application, this is where it gets really tricky. Because the more documents or tokens that 6:18our large language model is going to retrieve, well, the harder it is for the LLM to recall that 6:24information, in addition to increased cost for our AI bill and wait times. And if we actually plot 6:30this out roughly, when we talk about accuracy and the amount of tokens retrieved by our RAG 6:36application, well, the more we add sometimes can have a marginal increase in performance or 6:41accuracy, but afterwards can in ... result in degraded performance because of noise or redundancy. 6:47So, maybe not everything should be dumped into the context of an LLM with RAG. But going back to 6:53Martin's point about the two phases of RAG, let's start to talk about ingestion. Because we need to 6:59be really intentional about our data curation, using perhaps open-source tools like Docling 7:05that can help us do document confersion ... conversion to get it ready for our RAG 7:09application. That means starting from, for example, PDF types to m-machine-readable and LLM-readable 7:16types like Markdown, with their associated metadata. And this means not just the text from 7:22our PDFs and documents or spreadsheets, but also tables, graphs, images, pages that are 7:28truncated and much, much more. So here we can enrich your data before we write it to that 7:34vector database or a similar storage. But after ingestion, the next step is retrieval or also 7:41known as context engineering. So, context engineering, as the name implies, allows us to 7:48form our context for the LLM for RAG applications into a compressed and 7:54prioritized uh,result. So, this starts with hybrid recall from databases, right? So, if the user is 8:01asking, "Hey, what is agentic AI?" what we're going to do is use both the semantic meaning of 8:07our question, but also do keyword search, specifically in this example, for agentic AI. Now, 8:14when we do the recall to get that information from our database, what we're also going to do 8:19when we get those top K chunks, as Martin mentioned, is re-rank them for relevance,right, to 8:25prioritize them for our LLM. When we get this back, well we can also do combination of 8:32chunks. So if these two chunks are related, well, we'll put them together and piece this, so at the 8:37end of the day, when we provide the context and the question for our LLM, we have one single 8:43coherent source of truth. This results in higher accuracy, faster inference and cheaper AI cost. Now 8:50that sounds great. And speaking of costs, I hear that local models can power 8:57RAG and agentic AI. Is that, is that the case? Yes, the rumors are true because instead of paying for 9:03an LLM, lots of developers have already been using open-source models, using open-source tools 9:10like vLLM or Llama C++. And this allows us to maintain the same API as 9:17a proprietary model but have the added benefit of data sovereignty—so, keeping everything on premise—and 9:23tweaking our model runtime for KV cache in order to have big uh, improvements that could 9:30speed up our RAG or agentic AI applications. Yeah, so that is AI agents with the 9:36help of RAG, a winning combination. Always, right? Well, maybe not 9:43always, but, you know, of course, it depends.