Learning Library

← Back to Library

Debunking Agentic AI and RAG Myths

9m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Agentic AI and Retrieval‑Augmented Generation (RAG) have become buzzwords, but popular myths—like “agentic AI is only for coding” and “RAG is always the best way to add fresh data”—are overstated.
The suitability of RAG (or any AI approach) is highly context‑dependent; there is no universal “always best” answer.
Agentic AI describes multi‑agent workflows that continuously loop through perceiving the environment, consulting memory, reasoning, acting, and observing outcomes, with agents communicating and using tools at the application level.
The most prevalent current implementation of agentic AI is in coding assistants/copilots, where distinct agents (architect, implementer, reviewer) collaborate like a miniature development team while a human remains the overall conductor.
While coding dominates today’s use cases, the discussion points to broader enterprise scenarios where agentic AI could be applied beyond software development.

Sections

Full Transcript

# Debunking Agentic AI and RAG Myths **Source:** [https://www.youtube.com/watch?v=fB2JQXEH_94](https://www.youtube.com/watch?v=fB2JQXEH_94) **Duration:** 00:09:48 ## Summary - Agentic AI and Retrieval‑Augmented Generation (RAG) have become buzzwords, but popular myths—like “agentic AI is only for coding” and “RAG is always the best way to add fresh data”—are overstated. - The suitability of RAG (or any AI approach) is highly context‑dependent; there is no universal “always best” answer. - Agentic AI describes multi‑agent workflows that continuously loop through perceiving the environment, consulting memory, reasoning, acting, and observing outcomes, with agents communicating and using tools at the application level. - The most prevalent current implementation of agentic AI is in coding assistants/copilots, where distinct agents (architect, implementer, reviewer) collaborate like a miniature development team while a human remains the overall conductor. - While coding dominates today’s use cases, the discussion points to broader enterprise scenarios where agentic AI could be applied beyond software development. ## Sections - [00:00:00](https://www.youtube.com/watch?v=fB2JQXEH_94&t=0s) **Debunking Agentic AI & RAG Myths** - The speaker critiques hype around “agentic AI” and Retrieval‑Augmented Generation, explains what each term actually means, and stresses that their suitability depends on context while outlining basic multi‑agent workflow components. - [00:03:12](https://www.youtube.com/watch?v=fB2JQXEH_94&t=192s) **Orchestrating Agentic AI with RAG** - The speaker explains how enterprise AI agents act like conductors—routing queries, invoking tools via a model‑context protocol, and mitigating hallucinations by employing Retrieval‑Augmented Generation’s offline indexing and online retrieval steps. - [00:06:47](https://www.youtube.com/watch?v=fB2JQXEH_94&t=407s) **Strategic Ingestion and Context Engineering** - The speaker outlines the two‑phase RAG workflow—careful data curation and conversion during ingestion, followed by hybrid semantic‑keyword retrieval and relevance re‑ranking to create a concise, prioritized context for the LLM. ## Full Transcript

0:00I think it's fair to say that some of the most used AI buzzwords in recent times have 0:06been, well, one of them is certainly agentic AI, and... let me 0:13guess another one, right? Probably RAG. Yeah. Retrieval augmented generation. And with those 0:20buzzwords has come plenty of hype and preconceived notions. Preconceived notions like 0:27how the primary use case for agentic AI today is coding. Exactly. Or that RAG is always the best 0:34way to incorporate specific, up-to-date information into a model's context window. Wait, so 0:40are we saying that these things are not the case? Oh, Cedric. You know, this is where we wheel out 0:47the consultant's default answer, right? Well, I guess I do. Is RAG always the best 0:53option? Well, here it comes. It depends. It depends. There you go. You know, I spent seven 1:00years as a technical consultant, and no matter what the question, a good old "it depends," that 1:05always seems to work. Well, I have an idea. How about we explain what it depends on. Right. So, 1:11let's start off by explaining what these terms agentic AI and RAG really mean. And then you can 1:18get your practitioner viewpoint out where these buzzy technologies are actually going to be put 1:22into action. Now, AI multi-agent workflows, they perceive their environment, they make 1:29decisions and they execute actions towards achieving a goal. And all of this happens with 1:34minimal human intervention. Now, architecturally, these components, they kind of form a loop. So, the 1:41first thing on the loop might be to perceive. And once they've perceived their 1:47environment, they can consult memory, they can 1:54reason, they can act along a particular path, 2:00and then they can go through the final stage, which is to observe what happened, and kind of 2:06round and round we go in a loop. the key here is that each agent operates at the application 2:13level. They're making decisions, they're using tools and they can communicate with each other. 2:18Now Martin, that's great. But if I had to pick the most common use case for agentic AI, I think it has 2:24to be coding agents, right? Uh, yeah. You mean like, uh, like code assistants and copilots? Precisely. 2:30And these are examples of agents that can help plan and architect new ideas that can 2:37help write code straight to our repository, and even help review the code that we've generated—with 2:43minimal human guidance and by using LLMs that have larger context windows with reasoning 2:49capabilities. This, this kind of looks like a, like a mini developer team, like where you have maybe a, 2:55an architect agent that kind of plans out the feature. And then we've got the 3:01implementer that's going to come along and actually write the code. And then we've got the 3:06reviewer that checks out that code, and then maybe send some feedback in a loop like this. 3:12Exactly. And this agentic pattern still needs human intervention. But our job is to be more of a 3:19conductor of an orchestra, right, than play a single instrument. Now, let's also think about 3:25another use case for agentic AI. Think about enterprises with the need to handle support 3:30tickets or HR requests. Or, for example, customers who have some particular query where specialized 3:37agents can autonomously filter and query this to the right agent that's able 3:44to then use tool calling in order to use services or an API, using some type of 3:51protocol like model context protocol, which standardizes the interaction between our LLMs and 3:57the tools that we use every day. Cool. So instead of using a chat window with an LLM to kick off an 4:04action, agents can be responsive in their own environment. Exactly. But, but there is a 4:10challenge, right? Because without reliable access to external information, these agents, they can 4:16quickly hallucinate, or they can make misinformed decisions. And one way we can limit 4:23those misinformed decisions is with retrieval augmented generation or 4:30RAG. Right. And RAG is essentially a two-phase system because you've got a offline phase where 4:36you ingest and index your knowledge, and an online phase where you retrieve and generate on demand. 4:41And the offline part, it's pretty straightforward. So, we start off with, well, let's start it over 4:47here. We're going to start with some documents. So, these are your documents. That could be Word files, it 4:52could be PDFs, whatever. And we're going to break them into chunks and create vector 4:58embeddings for each chunk using something called an embedding model. Now, 5:04these embeddings, they get stored into a special type of database called 5:11a vector database. So, now you have a searchable index of your knowledge. And when a query 5:18hits the system—so we've got perhaps here a prompt from the user—that's where the 5:24online face kicks in. So, the prompt goes to a RAG retriever, and that takes 5:31the user question and it turns it into vector embeddings using the same embedding model. And 5:38then it performs a similarity search in your vector database. Now, that's going to return back 5:45to you the top K most relevant document chunks, perhaps 3 to 5 passages that are most likely to 5:51contain the answer. And that is what is going to be received by the large language 5:58model at the end of this. Wow, Martin! And this is really powerful. But when we start to scale things 6:05up with more data, right, from our organization, or perhaps allow more users to start using 6:12this RAG application, this is where it gets really tricky. Because the more documents or tokens that 6:18our large language model is going to retrieve, well, the harder it is for the LLM to recall that 6:24information, in addition to increased cost for our AI bill and wait times. And if we actually plot 6:30this out roughly, when we talk about accuracy and the amount of tokens retrieved by our RAG 6:36application, well, the more we add sometimes can have a marginal increase in performance or 6:41accuracy, but afterwards can in ... result in degraded performance because of noise or redundancy. 6:47So, maybe not everything should be dumped into the context of an LLM with RAG. But going back to 6:53Martin's point about the two phases of RAG, let's start to talk about ingestion. Because we need to 6:59be really intentional about our data curation, using perhaps open-source tools like Docling 7:05that can help us do document confersion ... conversion to get it ready for our RAG 7:09application. That means starting from, for example, PDF types to m-machine-readable and LLM-readable 7:16types like Markdown, with their associated metadata. And this means not just the text from 7:22our PDFs and documents or spreadsheets, but also tables, graphs, images, pages that are 7:28truncated and much, much more. So here we can enrich your data before we write it to that 7:34vector database or a similar storage. But after ingestion, the next step is retrieval or also 7:41known as context engineering. So, context engineering, as the name implies, allows us to 7:48form our context for the LLM for RAG applications into a compressed and 7:54prioritized uh,result. So, this starts with hybrid recall from databases, right? So, if the user is 8:01asking, "Hey, what is agentic AI?" what we're going to do is use both the semantic meaning of 8:07our question, but also do keyword search, specifically in this example, for agentic AI. Now, 8:14when we do the recall to get that information from our database, what we're also going to do 8:19when we get those top K chunks, as Martin mentioned, is re-rank them for relevance,right, to 8:25prioritize them for our LLM. When we get this back, well we can also do combination of 8:32chunks. So if these two chunks are related, well, we'll put them together and piece this, so at the 8:37end of the day, when we provide the context and the question for our LLM, we have one single 8:43coherent source of truth. This results in higher accuracy, faster inference and cheaper AI cost. Now 8:50that sounds great. And speaking of costs, I hear that local models can power 8:57RAG and agentic AI. Is that, is that the case? Yes, the rumors are true because instead of paying for 9:03an LLM, lots of developers have already been using open-source models, using open-source tools 9:10like vLLM or Llama C++. And this allows us to maintain the same API as 9:17a proprietary model but have the added benefit of data sovereignty—so, keeping everything on premise—and 9:23tweaking our model runtime for KV cache in order to have big uh, improvements that could 9:30speed up our RAG or agentic AI applications. Yeah, so that is AI agents with the 9:36help of RAG, a winning combination. Always, right? Well, maybe not 9:43always, but, you know, of course, it depends.