Learning Library

← Back to Library

Agentic AI vs Mixture of Experts

Key Points

  • An agentic AI workflow uses a planner agent to assign tasks to specialized agents (A, B, C), whose results are collected by an aggregator to produce the final output.
  • The “mixture of experts” architecture replaces the planner with a router that dispatches input to parallel expert models, then merges their token streams into a single result.
  • Agentic workflows rely on LLM‑driven agents equipped with perception, memory (working and long‑term), and domain‑specific tools (e.g., data querying, analysis, visualization) to act autonomously toward a goal.
  • Mixture‑of‑experts systems focus on parallel processing of the same input by multiple expert models, emphasizing speed and scale rather than explicit decision‑making or memory handling.
  • Although architecturally distinct, both paradigms represent cutting‑edge AI designs and can be combined—for example, using a router to select agentic modules or integrating expert outputs into an agent’s reasoning loop.

Full Transcript

# Agentic AI vs Mixture of Experts **Source:** [https://www.youtube.com/watch?v=4-FH09AMsp0](https://www.youtube.com/watch?v=4-FH09AMsp0) **Duration:** 00:09:15 ## Summary - An agentic AI workflow uses a planner agent to assign tasks to specialized agents (A, B, C), whose results are collected by an aggregator to produce the final output. - The “mixture of experts” architecture replaces the planner with a router that dispatches input to parallel expert models, then merges their token streams into a single result. - Agentic workflows rely on LLM‑driven agents equipped with perception, memory (working and long‑term), and domain‑specific tools (e.g., data querying, analysis, visualization) to act autonomously toward a goal. - Mixture‑of‑experts systems focus on parallel processing of the same input by multiple expert models, emphasizing speed and scale rather than explicit decision‑making or memory handling. - Although architecturally distinct, both paradigms represent cutting‑edge AI designs and can be combined—for example, using a router to select agentic modules or integrating expert outputs into an agent’s reasoning loop. ## Sections - [00:00:00](https://www.youtube.com/watch?v=4-FH09AMsp0&t=0s) **Agentic Workflow vs Mixture of Experts** - The speaker contrasts the hierarchical planner‑agent‑aggregator structure of agentic AI workflows with the parallel router‑expert‑merge design of mixture‑of‑experts models, highlighting their similarities and differences. - [00:03:06](https://www.youtube.com/watch?v=4-FH09AMsp0&t=186s) **Specialized Agents and MoE Loop** - The speaker explains how domain‑specific agents (data, analysis, visualization) operate in a perception‑memory‑reason‑action‑observation cycle and interact at the application level, while contrasting this with a Mixture‑of‑Experts neural architecture that routes inputs via a gating network to multiple specialized model experts. - [00:06:16](https://www.youtube.com/watch?v=4-FH09AMsp0&t=376s) **Multi‑Agent Incident Response with Experts** - An enterprise security workflow uses a planner agent to dispatch alerts to specialized agents—including an LLM powered by a mixture‑of‑experts router that dynamically selects specific expert submodels for each token batch—to diagnose lateral movement and recommend actions. ## Full Transcript
0:00If you're familiar with AI multi-agent workflows, 0:04you might have seen some form of this architecture before. 0:08So at the top here, we provide 0:12some input to an agentic workflow. 0:15And that will ultimately kind of flow down here 0:18to produce some output in the end. 0:22Now if we look into the boxes, 0:24typically you would have at the top here, 0:27a planner agent that's responsible for distributing 0:32work to the agents within this workflow. 0:35And then, each of these agents. 0:38So, let's say, we've got an agent here. 0:40We'll just call this agent A 0:42and agent B and C. 0:45Each one of these guys 0:48does specialized work, is specialist in a particular task. 0:53And then once it's done its work, 0:55then the results flow down here to the aggregator. 1:00And that aggregator agent prepares a response. 1:03And that's how we get our output. 1:04So that is an agentic AI workflow. 1:07But in AI, there's another architecture 1:10that we've had for quite a while that's gaining popularity. 1:14And that is called mixture of experts. 1:18And at a high level, 1:20it has a very similar-looking workflow. 1:24So instead of a planner, we start with a router 1:27which receives the input and that dispatches the work. 1:30And then we have here a series of experts. 1:35Let's call this expert A, expert B and expert C. 1:41And these guys, they work in parallel. 1:45And then at the bottom here, we have a merge component 1:50to reassemble the process tokens into a single stream. 1:53So it kind of begs the question: 1:57what's the difference between these two things AI agents and mixture of experts? 2:02And the answer is, well, 2:05quite a lot of difference actually. 2:07But something they both have in common is 2:09they are very much part of frontier AI models today. 2:12So let's discuss what they do 2:14and how they can be used together. 2:17So AI multi-agent workflow is they perceive their environment, 2:21they make decisions, and they execute actions towards achieving a goal. 2:24And all of this happens with minimal human intervention. 2:28The agents, they typically use LLMs that have been given 2:31specific roles and tools and contexts. 2:33Now agentic AI workflows, they're 2:35usually composed of modular components, 2:38like, for example, one module 2:40that might be the perception module. 2:43That's kind of how the agent senses or ingests 2:46information from its environment or its user input. 2:49Then there's also a component 2:51typically for memory. 2:53This is the knowledge store. 2:56That memory can be working memory for remembering the current context. Or, 2:59it could be long-term memory for knowledge accumulation 3:02over time, like domain facts or remembering user preferences. 3:06And then there's an assortment of specialized agents. 3:10And these are agents that excel at specific domains. So, 3:14for example, we might have one specialized agent 3:17that is a data agent that knows how to query databases and clean data. 3:22We might also have a specialized agent called an analysis agent 3:26that's trained on business intelligence. 3:29And then maybe we also have a visualization agent as well 3:34that creates charts and graphs. 3:37Now architecturally, these components, 3:40they form a loop. 3:42So there's really different stages to this. 3:44So first of all, they perceive, 3:47then they're going to consult some form of memory. Remember, 3:51the memory component. 3:53From there, they're going to reason, 3:57and they're going to act based upon that reason. 4:01And then finally, they're going to observe what happens 4:04based on that action. 4:06And then kind of round and round 4:08we go in this loop. 4:10And the key here is that each of these agents 4:14operates at the application level. 4:16They're making decisions, they're using tools, 4:18and they can communicate with each other. 4:21The mixture of experts, on the other hand, 4:23that operates at the architecture level, and MoE 4:26is a neural network design 4:28that splits a model into multiple experts. 4:33let me draw three experts here, 4:37although in reality there'll be a lot more. 4:39And each of these experts specializes 4:43in a part of the input space. 4:45And then there's also a gating network at the top 4:49that routes the input 4:52to the different experts in this mixture of experts architecture. 4:56And it goes through this before it gets to the next layer 5:00coming into the merge component here. 5:03And that receives all of the responses 5:05from the experts that were invoked 5:07and then performs mathematical operations to basically combine the the output tensors 5:12from these different experts into a single representation 5:15that continues through the rest of the model. 5:17And one of the big advantages of MoE 5:21is sparsity, because only the active expert parameters 5:26contribute to that input's computation. 5:29So if we take an LLM example, 5:32let's say the IBM Granite 4.0 Tiny Preview model. Well, 5:37that uses 64 different experts 5:42in its architecture, 5:45and it has around 7 billion 5:49total parameters in the model. 5:53But of those, only about 1 billion 5:58are active at inference time. 6:01So that makes it a pretty memory-efficient language model 6:06capable of running on a single, pretty modest GPU. 6:10So in an MoE model, 6:13these experts, they're not separate AI agents. 6:16They're specialized neural network components 6:20within the same model. 6:22So let's consider a use case where a multi-agent workflow 6:26and a mixture of experts models show up in the same system. 6:29So let's just imagine an enterprise incident response workflow. 6:34So we've got a security analyst who's going to start this off. 6:38And they're going to start off as input by providing 6:41just an alert bundle into our model and 6:45maybe a short natural language question, like 6:48is this lateral movement? 6:50And if it is, what should I do about it? 6:53Now that goes into our agentic workflow, and we have a number of components there. 6:58So we have, for example, a planner 7:01agent that's going to be the first component that breaks up the request. 7:05And then it kind of spins up the agentic workflow 7:08and then passes it along to these specialized agents. 7:11So we might have a log triage 7:15agent here that parses the raw telemetry, 7:18and we might have a threat intel agent here 7:23that processes indicators and so forth and so forth... 7:27... as we go down this workflow. 7:30And this log triage agent 7:32that could actually be implemented as an LLM that uses 7:37mixture of experts as its architecture. 7:41So as the tokens stream in 7:45to the mixture of experts gating network 7:48or the router, if you like, that looks at each micro 7:52batch of text and on the fly decides 7:55which handful of experts inside the model should handle it. 7:59Now, only those experts, perhaps, 8:01we'll go through to expert 8:04one and maybe expert two, 8:07but none of the other experts. 8:09We'll just kind of leave those alone. 8:11So perhaps it would just use two experts 8:13out of a total of 64, 8:16and it would just activate those for that particular micro batch. 8:20So the forward pass, it's just going to touch a fraction 8:23of the overall parameters. And the selected experts, 8:26they process their slice of the representation in parallel, 8:30and then they come back down 8:32to the merge function 8:34down here that mathematically stitches 8:37that outputs back together before the next transformation layer. 8:40So perhaps this log triage agent 8:42is 7 billion parameters as an LLM, 8:46but only about 1 billion active 8:47parameters are used during inference. 8:50So agents, 8:51they route tasks across the workflow. 8:54They decide the next step, maybe call at all, 8:57update shared memory and stuff like that. 8:59A mixture of experts routes tokens inside 9:02a single model, deciding which internal parameter slices 9:06light up the next few milliseconds of compute. And stack them well, 9:11and you get workflows that reason broadly 9:13and specialize deeply.