Learning Library

← Back to Library

Europe's Mistral Medium 3: AI Contender

Key Points

  • Europe’s AI landscape may not lead in building the largest models, but it can “define the rules of the road,” offering a strategic advantage despite trailing the U.S. and China.
  • Mistral’s new Medium 3 model claims 8× lower operating costs and on‑premises deployment capability, positioning “medium is the new large” for enterprises seeking more affordable, locally‑hosted AI.
  • Critics note that Mistral’s focus on larger‑scale models (up to 70 B parameters) leaves a gap in the open‑source ecosystem, which still heavily relies on smaller models (e.g., 3–8 B) for broad developer adoption.
  • The episode also touches on broader industry moves, including a major chip shipment to Saudi Arabia, fresh benchmarking releases, and an experimental initiative involving AI‑generated advertisements.

Sections

Full Transcript

# Europe's Mistral Medium 3: AI Contender **Source:** [https://www.youtube.com/watch?v=kjRHpuXGmH0](https://www.youtube.com/watch?v=kjRHpuXGmH0) **Duration:** 00:36:35 ## Summary - Europe’s AI landscape may not lead in building the largest models, but it can “define the rules of the road,” offering a strategic advantage despite trailing the U.S. and China. - Mistral’s new Medium 3 model claims 8× lower operating costs and on‑premises deployment capability, positioning “medium is the new large” for enterprises seeking more affordable, locally‑hosted AI. - Critics note that Mistral’s focus on larger‑scale models (up to 70 B parameters) leaves a gap in the open‑source ecosystem, which still heavily relies on smaller models (e.g., 3–8 B) for broad developer adoption. - The episode also touches on broader industry moves, including a major chip shipment to Saudi Arabia, fresh benchmarking releases, and an experimental initiative involving AI‑generated advertisements. ## Sections - [00:00:00](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=0s) **Untitled Section** - - [00:03:02](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=182s) **Mistral's Quiet Open-Source Resurgence** - The discussion weighs Mistral's medium‑sized, open‑weight models, performance focus, and low inference costs against its lower public profile, questioning if the company remains competitive in the rapidly evolving open‑source AI landscape. - [00:06:09](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=369s) **Local AI Trends and Model Shrinking** - The speaker warns that launching AI firms in Europe is risky due to heavy regulation, predicts a shift toward region‑specific, language‑optimized models and rapid reductions in model size as performance improves, and cites Mistral’s strategy as an example of adapting to these economic and technological pressures. - [00:09:15](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=555s) **Debating AI Geopolitics and Model Performance** - A speaker challenges US‑centric narratives by highlighting European AI breakthroughs such as Mistral’s strong performance, questioning memory biases, and criticizing limited access to advanced AI resources. - [00:12:19](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=739s) **Saudi AI Factories Power Scale** - Panelists examine NVIDIA's partnership with Saudi investors to deploy hundreds of thousands of GPUs and a 500 MW data‑center capacity, comparing it to conventional ≈20 kW racks to illustrate the challenges of scaling AI infrastructure. - [00:15:24](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=924s) **Saudi Arabia's AI Power Play** - The speaker notes Saudi Arabia’s aggressive acquisition of AI chips, compute capacity, and language models to become a global AI contender, while warning that success will also require building an open ecosystem, skilled talent, and responsible development frameworks. - [00:18:30](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=1110s) **Sovereign Wealth Funds Powering Digital Infrastructure** - The speaker explains how nations like Singapore deploy sovereign wealth fund capital to build fiber‑optic networks and data‑center hubs, converting oil money into hardware assets such as GPUs to attract tech businesses and establish regional dominance. - [00:21:35](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=1295s) **Emerging Health Benchmarks and Fragmentation** - The speakers discuss OpenAI’s new Health Bench and IBM’s IT Bench, then debate how the growing number of specialized benchmarks may fragment evaluation standards and complicate model selection for health applications. - [00:24:39](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=1479s) **Rethinking State‑of‑the‑Art AI** - The speakers critique the relevance of universal “state‑of‑the‑art” claims and benchmark charts, arguing that the diversity of AI use cases demands tailored, self‑created evaluations instead of generic marketing hype. - [00:27:44](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=1664s) **Fine‑Tuned Models vs Benchmark Marketing** - The speaker argues that specialized, fine‑tuned models will outpace general ones, questions how future releases will prove superiority beyond standard benchmark charts, and highlights the diminishing marketing impact of benchmark bragging in favor of concrete performance demonstrations. - [00:30:53](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=1853s) **Shifting Benchmarks Toward Specialized Agents** - The speaker argues that evaluation should move from broad, generic models to narrow, task‑specific agents—allowing deeper assessment—and then transitions to a forthcoming anecdote about an Amazon Prime Video AI hook. - [00:34:00](https://www.youtube.com/watch?v=kjRHpuXGmH0&t=2040s) **AI-Driven Hyper-Personalized Advertising Concerns** - The speakers discuss worries about intrusive, AI-generated contextual ads and speculate on Amazon’s new native advertising feature that lets AI create seamless, hyper‑personalized placements. ## Full Transcript
0:00Mistral is France's National champion in AI will it make Europe a global contender 0:04for the technology in the years to come? 0:06Chris Hay is a Distinguished Engineer and CTO of Customer Transformation. 0:09Chris, welcome back to the show. 0:10What do you think? 0:11The US is just falling in Europe's footsteps. 0:15All right, great. 0:16Volkmar Uhlig is Vice president AI Infrastructure Portfolio Lead. 0:19Uh, Volkmar, what do you think? 0:20I think the judgment is still out, but I hope the best follow. 0:23Alright, great. 0:24And Kaoutar El Maghraoui is a Principal Research Scientist and Manager for 0:28Hybrid Cloud Platform, uh, Kaoutar. 0:30Welcome back. 0:31What's your take? 0:32I think Europe may not win the race to build the biggest model, like what's 0:35happening in the US and China, but it's, it has a, a big, uh, opportunity 0:39to still define the rules of the road. 0:41All right? 0:41All that and more on today's Mixture of Experts. 0:50I am Tim Hwang and welcome to Mixture of Experts. 0:52Each week, MoE brings together a world-class team of researchers, 0:55engineers, and product leaders to discuss and debate the biggest 0:58news in artificial intelligence. 1:00As always, we have a ton to talk about. 1:02We're gonna talk about a big shipment of chips to Saudi Arabia. 1:04Some new releases in the world of benchmarking and a new 1:07experiment around AI generated ads. 1:09But first I really wanted to talk a little bit about Mistral 1:13Medium 3, which was a launch that happened, uh, just a few weeks back. 1:17Uh, it's part of a, a class of models that they've been working on for some time. 1:21Um, they tout 8x lower costs and really kind of the opportunity to 1:25do on-premises deployment with this. 1:27Mistrial Medium series of models and their tagline is Medium is the new large. 1:32Um, and so I guess Chris, maybe I'll start with you. 1:35You know, we haven't talked about Mistral on the show for quite a while, and I think 1:39one of the reasons I wanted to bring them back up again was obviously there's this 1:42new release, but it kind of offers the question of like, is Mistral still kind 1:45of a contender in this open source space? 1:48Um, and curious about how you size that up. 1:50I think so I love Mistral first of all, and to come back to my earlier point, 1:54right, the, the Mistral team were the original folks that came up with the Llama 2:00models in the first place, Llama one. 2:01So they are great innovators. 2:04I love the Mistral models. 2:06I especially love the Mistral 7B from a few years ago. 2:09Um, and I think the new Mistral Medium 2:12is great, but maybe the criticism I would give is that have we 2:17had that sort of 7, 8B model or even a 3B type model from them? 2:22They've been focusing on like what the, the Mistral Medium three is probably 2:26what, a 70 billion parameter model. 2:28Even they're small model. 2:29They really 2:30recently was a 24 billion parameter model. 2:33So when we think about the world that most open source developers 2:37live in, it is the world of a Llama. 2:39It's the world of Hugging Face, and therefore you need the smaller models. 2:43So by not having those smaller models out there, that maybe just sort of 2:47fell out of our thinking a little bit, but what they're doing in the lab. 2:52Mistral Medium 3 is a fabulous model. 2:54It's super fast, it's, it really is state of the art. 2:57But again. 2:58Another thing I would criticize is they haven't put a reasoning 3:01model with that as well. 3:02So it is as good as it is. 3:04We've also moved on to reasoning models and I think they just 3:08need to kind of push that. 3:08But I am, I'm confident they're gonna do some great stuff and, uh, and, and 3:13we're gonna see a resurgence of them. 3:14Yeah, for sure. 3:15Uh, Kaoutar I'm curious if you agree, I mean, I think Chris is making a sort 3:18of interesting point, which is like, 3:20Medium is maybe still too large for a lot of what's happening in, uh, in enterprise. 3:25Um, you know, it kind of sounds like at least the story that Chris is telling is 3:28almost that they, they kind of like are sort of missing the boat on where the kind 3:32of current competition is and where the current heat is in the open source space. 3:35Do you think that becomes a problem for them going forwards? 3:38I, I think so, actually. 3:39Um, you know, I think. 3:42Mis what, you know, their strategy has been, you know, these open weights. 3:46They also focus on high performance. 3:48Also the focus on the low inference costs. 3:50They, they all have been great and while, you know, they haven't been 3:54making many headlines as OpenAI or Anthropic right now, I think they're 3:58quite consistency in releasing these. 4:01Strong open models is worth noting. 4:04So the Medium 3, for example, ranks competitively on standard leaderboards. 4:08And I think their commitment to, um, the open source community is a very 4:13rare stance in today's increasing closed, you know, ecosystem. 4:17So the 4:18the question is, you know, has really Mistral faded, uh, from the race? 4:22Or are they quietly building the foundations for long-term impact 4:26in the open weight AI ecosystem? 4:28Which I think, you know, I think they're heading that, that that way. 4:31Of course, like, uh, Chris mentioned, you know, they, they need to fix, you 4:35know, the reasoning, uh, aspect of their models, but I think they'll get there. 4:39Yeah, for sure. 4:41Volkmar, I think one of the interesting parts about Mistral for 4:43me is that it feels like, you know. 4:45Countries increasingly have like their big AI champion, right? 4:49So there's like DeepSeek in China and the US arguably OpenAI. 4:52But a number of companies, um, and I think for a long time as trial really 4:55was kind of like the big hope of Europe and certainly of France on the idea that 5:00they would have a national champion that be able to kind of like lift many boats 5:03and help build sort of an AI sort of. 5:06Kind of industry and sort of leadership for kind of the, the continent. 5:10I'm curious if you kind of buy that as a thesis. 5:12I know you said in your opening remarks that you said that you wish them all the 5:15best, which I guess the more critical way of saying that is you don't know if 5:18they much have like a very strong hope. 5:20Um, but curious if you wanna talk a little bit more about that. 5:22Okay. 5:22That's a loaded question. 5:25Um, if you look where GPUs get deployed, there's a very strong 5:30concentration in the states and in like. 5:34A couple of countries which are trying, like we are talking about 5:37this with the Saudis, um, there's, you know, there's investment going 5:42on, but if you look at the majority of GPU deployment, it still happens 5:46in very, very few places in the world. 5:48Like, you know, the 50,000, a hundred thousand GPU clusters are 5:52pretty much only in the States. 5:54And so I think there is a, there's a concentration of 5:58capital and as a concentration of skills, uh, which is clearly 6:02not in Europe's advantage. 6:04Uh, Europe is really good at writing regulations right now, 6:07but they're regulating something which they cannot build themselves. 6:09And I think this is a real big danger. 6:11And so I think companies like starting an AI company today in Europe, um, I 6:18it's kind of insane, whereas like, you know, it's an an under not understood 6:22market and already overregulated and so I think Mistral is kind of 6:27like, you know, walking the line. 6:29I think in general what we are going to see is that, um, companies 6:33are going to focus on their local markets. 6:35Um, if you look just from a language perspective, all these models are very 6:39English focused and then all the other languages are almost translations. 6:43And so I, and you know, the Chinese said, okay, we wanna have a Chinese first model. 6:47Um, and so I think there will be kind of local champions. 6:52Uh, I think also that, um 6:54we see a trend, you know what you're saying? 6:56Like medium is the new large, I would say, and smallest the new medium. 7:00And every six months that's the case, right? 7:03So if you look at the capabilities of the models, what we could only 7:06do in huge models is now moving into something you can run on your laptop. 7:11But it took like two or three iterations. 7:13And so I think Mistral It's just adopting to that general trend, you 7:18know, that the technology is now at a point that I don't need a 70 B model 7:22anymore to get that type of performance. 7:24Um, and it's just, you know, the nature of, of the beast. 7:27I think also the smaller models are just much more economic. 7:30And so there is just economic pressure, right? 7:33The moment you deploy this at scale and you don't have money to burn 7:36anymore, then you need to actually look at what the cost footprint is. 7:39And I think that's probably a reaction of Mistral as well. 7:42Like in some ways, like you have to be close to where the clusters are 7:45to build the talent and expertise. 7:47Yeah. 7:48I, I think the, the capital follows the talent. 7:50The talent follows the capital, and so you, you are in, in a world where 7:55Europe just doesn't have deployments. 7:57So the people who are really good, they come here. 7:59I mean, you need to go somewhere where someone is willing to pay for 50,000 GPUs. 8:04That's not happening in Europe. 8:05Yeah. 8:05I think I, I second Volkmar's point, you know, Europe, actually 8:10today where it's lagging is, you know, the compute infrastructure. 8:13So it's really, uh, the. 8:15continent, 8:15what we see is it's really short of, you know, on the sovereign AI compute, 8:21there is no European equivalent of the AI 100 h 100 scale clusters that 8:25you find in the US and even in China. 8:27Uh, there's also, I think, a lack in the VC funding and the, so the 8:31startups in Europe today, they're struggling to access the scale of 8:34funding that fuels also Silicon Valley and to the Chinese tech ecosystems. 8:39That's also like the, these are like very, uh, you know, uh. 8:43Things that are kind of slowing their progress and the 8:46foundation model development. 8:48So, you know, we, it doesn't have like a direct equivalent of 8:51to open AI or Google Keep Mind. 8:53So Mistral, you know, even if it's, you know, they, they're champion 8:57but like, you know, I think Volkmar pointed, you know, access 9:00to compute is, is very important. 9:02So we need to have both. 9:03Chris, do you buy this? 9:04I mean, I think my note of skepticism is like. 9:07We live in the 21st century, right? 9:08Like there's the internet, like the idea that you have to be kind of 9:11proximate to all the compute in order to build a, a strong AI industry. 9:15That I guess, I know it works a little bit against my intuitions, 9:17but Chris, do you wanna jump in? 9:19I don't buy any of that in the slightest. 9:21Have we all got short term memories or something? 9:23What were we saying about China? 9:25What, six months ago we're like, oh, US is the greatest. 9:28Well, they don't, China doesn't have access to any H one hundreds, and then 9:32Deepsea comes along and we all go. 9:34Uh, well, okay. 9:35Uh, whoops. 9:37And I think that's the exact same case. 9:39I mean, let's look at what Mistral has done there. 9:41Right? 9:41We're sitting arguing about medium, but it's one of the strongest models 9:45that is out there that's non reasoning. 9:47So they've shown that in the benchmarks. 9:49It is a great model if you go and use it. 9:51It is a fantastic model, right? 9:53And, and again, I'm guessing at the 70 billion parameter 9:56model, but that's phenomenal. 9:57What they've done is outperforming the Llama form Maverick model, for 10:00example, and we're saying Europe. 10:02You're useless. 10:03Sorry. 10:03You just created one of the best models, and again, let's go back 10:07to being Europe for a second. 10:08Who is leading Google's Gemini model? 10:10Demis Hassabis, right? 10:12Where does he come from? 10:13Not America, right? 10:14If we look at OpenAI, who started that from an architectural point of view? 10:18Ilya Sutskever. 10:19Where did he come from? 10:21Not America, right? 10:22So there is lots of European innovation coming through. 10:26European companies are building stuff and they don't have access to AI classes. 10:32Just write the first part. 10:33Maybe not the second 10:34part. Most part. 10:35Do you wanna get into? Yeah, you just called out individuals and if you 10:38look at the, uh, you know, the top companies founded in the, in the Bay 10:41area, that is most, like 50% or so is not from Americans like American born. 10:47That's a non argument because what you are saying is like my. 10:50First citizenship matters. 10:52It's like, no, it matters where you actually build your company. 10:55Okay, but, but let's take the Laa models who were built by France, right? 11:00That that was that team that were doing that. 11:02So innovation is coming from Europe, right? 11:06Hang on. 11:07Two things are the businesses, is that innovation? 11:11Europe has the brain power. 11:13Europe has no money to actually fund it, and that's why it's 11:16all funded in the United States. 11:18If you look at the amount of. 11:20Capital, which is spent in the US on new technology and the 11:24amount of capital spent in Europe. 11:26There's nothing spent in Europe. 11:28So, and therefore we, I'm very happy we got a full brain drain. 11:32Let's take all these people and make it happen in the United States. 11:36But we literally had this argument six months ago about DeepSeek right? 11:40So I'm telling you that's a 11:42different story. 11:42Completely different 11:43story. I don't think 11:44it is. 11:44I think so. 11:45I do not think it is, 11:45yes, because. 11:46In one case you have a wall and people cannot get out, 11:49and the other case you don't. 11:51No. 11:51The DeepSeek, they have the GPUs, but they probably didn't have the latest, 11:55you know, or top scale GPUs, but they were very clever in how do we, I. You 12:00know, take those limits and even act at the PTX level of NVIDIA you know, 12:04of the cuda so they can basically overcome the limitations that they have. 12:09So I think, you know, I think there are two things here we're talking about. 12:12There is the brain power and there there is, you know, the 12:14infrastructure and the deployments. 12:16So the resource, the R&D, yeah, it could come anywhere. 12:20But then when you wanna deploy these things at scale and see the 12:22business value, that's I think where, where we see the lax. 12:30So this is a very nice segment or a nice segue into our next segment, actually. 12:34So, uh, it's another story I wanted to kind of bring up and have 12:37the panel react to, but I think it'll actually be a continuation 12:39of this discussion in some ways. 12:41Um, super interesting news coming out of Saudi Arabia. 12:44Uh, this week, uh, NVIDIA announced that it will be collaborating 12:47with the Saudi investment funds to build what they call AI factories. 12:52Um, and they are projecting sort of the deployment of several hundred thousand, 12:58uh, advanced processors in Saudi Arabia over, you know, the next amount of time. 13:02Um, and they're promising sort of a data center capacity of 13:06as much as quote 500 megawatts. 13:09Um, the first step of this is gonna be a shipment of 18,000, 13:12uh, GB 300 Grace Blackwell chips. 13:15Um, and I think maybe Volkmar, just to maybe turn it to you first, 13:18because I know this isn't your world. 13:20Can you gimme a sense of like. 13:21What is 500 megawatts compared to where we are today? 13:25There are two, there are two viewpoints on this. 13:28That's a lot and it's nothing. 13:30Okay. 13:31Um, if you take, um, um, if you take the latest announced NVIDIA rack, 13:38that's, um, so first a, a typical rack in a data center is about 20 kilowatts. 13:44If you're really trying to push the envelope, it's 30, 35, but 13:47that's kind of on the edge. 13:48So 20 kilowatts. 13:50If you take 500 megawatts, you do the math, it's. 13:53Thousands of racks. 13:55Now, if you look on the flip side of what NVIDIA announced, uh, with 13:59the next generation rack, which is, you know, a peta flop in a 14:03rack, it's 600 kilowatts per rack. 14:07Okay? 14:07So if you take your 500 megawatts, then you can fit, you know, 14:11800 of these racks in in. 14:13Uh, in a data center. 14:14Now, 800 racks is still a very large installation, but like if you look in the, 14:19uh, like it's not 20,000 racks, right? 14:22So we are in a world where it's sufficient, a sufficiently large 14:26deployment to actually make a dent. 14:28Um, uh. 14:30For a whole country that's probably not enough. 14:32Kaoutar, one of the things we were talking about a little bit earlier, I 14:35guess was kind of Volk Mar's thesis that like the talent follows the capital. 14:38And so I guess maybe you could draw a comparison where you say, 14:41well, Europe's got the talent, but it doesn't have the capital. 14:43This seems to be a case where, you know, the Gulf states they, they 14:46have the capital and now the question is whether they're not gonna be 14:49able to bring sort of the talent to really build out a much broader AI 14:53kind of industry. 14:54I'm curious about how you think the prospects are of, you know, these kinds 14:57of deployments being the thing that sort of allows you to sort of trigger sort 15:01of global leadership in the technology. 15:03Yeah, that, that's a very interesting point here. 15:05It's like what we're we're seeing in Europe, you know, talent is there, 15:09the capital is lagging. 15:10I think what, what's happening in Saudi Arabia? 15:13So the deal that's happening, you know, with the NVIDIA and AMD and the US 15:18you know, it, it's marking a, a major ma milestone in the rise of what we 15:22call the sovereign AI infrastructure. 15:24So Saudi Arabia here is not just buying chips there. 15:27They're taking the kind of a stake and they're trying to claim, 15:31you know, uh, basically their place in the future of compute. 15:35So, uh, and they're, you know, with, you know, the Grace Blackwell deployments, 15:41they're trying to also to commit, you know, these Arab language LLMs and, 15:45uh, nearly like, I think about two gigabytes of AI power on the horizon. 15:49So it's not just a regional experiment, I think it's a global 15:52power play that they're trying to play here. 15:55And um, I think from the US side, this is also reflecting a clear 15:58shift in the AI export strategy, which China restricted here. 16:03Saudi Arabia and the Gulf nations are immersions are their pre, you 16:06know, the US preferred strategic partners for high-end ai. 16:10But the 16:11the issue here is, you know, these sovereign AI efforts, they need not 16:15just more than the just the silicon, not more than just the compute. 16:18So they'll need also the open ecosystem, the talents, the responsible development 16:23frameworks to really, truly compete. 16:26So I think we'll, we'll have to see, you know, what Saudi Arabia is doing. 16:30You know, they're getting this huge infrastructure, they're 16:33building this huge infrastructure. 16:35But I think the ecosystem and then the talent, you know, they 16:38still need to work on that. 16:40Well, Chris, maybe I'll bring you in. 16:41I mean, 'cause I think you were a Europe booster, um, and I kind of curious about 16:45like if you think you are kind of a, a Gulf States booster now, you know, 16:49given seeing these investments like. 16:51You know, will they be able to bring the talent like in the future? 16:53Will a European researcher say, well, I could go work for a company in the US or 16:56I could go work for a company in the Gulf. 16:58Like do you think those dynamics will start to happen as we see these 17:01deployments get bigger and bigger and bigger in say, Saudi Arabia? 17:04I think they already have the talent. 17:06So I've spent quite a bit of time in Saudi Arabia with, on a bunch of these companies 17:09and they're heavily investing in AI. 17:12And again, even if we go back a couple of years, um. 17:16Just after the Llama models came out, the first ones, what was the 17:19most popular model At that point? 17:21It was the Falcon models. 17:22And where did they come from? 17:24Saudi Arabia. 17:25So they already have the, um, they already have the talent within the 17:32region, um, between Saudi and UAE who have been sort of developing, 17:37uh, some of these models in region. 17:39And I think therefore. 17:42We're gonna see talent flock towards that infrastructure, you 17:45know, to Volkmar's point earlier. 17:47Um, so I, I think you're gonna start to see stuff coming out of them. 17:51And, and back to my earlier point, I think even with that amount of 17:55infrastructure, when you place constraints, people get more creative. 17:59So I think they're gonna do interesting things and, and then. 18:02Again, maybe taking sort of language first models as well. 18:06You're gonna start to get different flavors as opposed 18:08to an English first model. 18:09Again, it was brought up earlier, so I think we're gonna start to see 18:12different flavors of models that may have different and newer capabilities, 18:16which will become a plus to the overall world ecosystem of models. 18:20So I, I am, I'm a positive on that and I think it's a, I think it's a good thing. 18:24If you look 20 years back, we had a similar 18:28uh, kind of investment cycle. 18:30And this was, uh, affected the massive build out of internet infrastructure. 18:34So this was fiber optics in the ground data centers, you 18:38know, computers connected. 18:39And so I think there's a certain repeat like those nations because 18:44they, you know, have a, have a different investment philosophy. 18:48Uh, like look at Singapore. 18:49Singapore made the decision. 18:51We wanna be the data center hub and the fiber optic hub. 18:53Of, of that region and they put poured billions of dollars in it, 18:57and that attracted a lot of business. 19:00I think those nations are because of, you know, central command structure of very, 19:05very advantage, uh, advant, um, advantaged in, uh, pushing those types of like 19:12large scale infrastructure investments out of, let's say a sovereign wealth fund and 19:17saying, okay, we put this capital to work. 19:20Um, they usually are more challenged if it's a, you know, pure IP play. 19:25Like, okay, we need to build some software, uh, because, you know, then 19:29they don't have a competitive advantage. 19:31So I think there, they're all playing to their strength here 19:33saying, okay, we take, you know. 19:35Oil money and convert it into GPUs, uh, and thereby creating that, 19:40that suction sound of getting, you know, people into the region. 19:43So I think it's a, it's a very natural play for those types of, um, uh, like 19:49regional players of trying to establish, you know, dominance in a new emerging 19:55field, which has this, oh, and by the way, we need to put 20, 30, $40 billion down. 20:00Um, you know, which, you know, don't do like out of normal, private. 20:05you know, funding, you need someone like the US. 20:08Right. It's like government scale. 20:10Exactly. 20:10It's government scale and that's exactly where they can shine and 20:13that's where you're seeing it. 20:14Yeah, for sure. 20:14And maybe a final question on this before we move to the next topic. 20:17I mean, Volkmar is someone who's, you know, very much 20:19in the infrastructure game. 20:20I. My friend was making the argument to me too recently that, you know, in 20:23some ways actually it's possible that, you know, Saudi Arabia might actually 20:26be advantaged even against the US on data center build out, um, because 20:31of, for example, like the ability to access and move energy assets around is 20:36just something that they, they're not gonna face the same kind of, you know, 20:38permitting issues and construction issues. 20:40That you have in the US do you buy that as kind of advantage that they have for No, 20:43I don't. Okay. 20:44Because we have 50 states and you have, I mean, I'm, I'm moved out of California 20:48to Texas and it's totally different. 20:51Permitting is so much easier. 20:52So I think we will see a similar thing like, you know, they're 20:55playing it on a, on a country level. 20:57We will play this on a regional level. 20:59And if you look at the US, there is pretty much just a network ring, which goes. 21:03All through, pretty much all states. 21:05And so you can put your data centers in Arizona, on Oregon, on Texas, 21:10so you go where the power is cheap for these types of build outs. 21:13Yeah. I like that take. 21:14It's like we have Saudi, it's, it's called Texas. 21:16Exactly. 21:16We also have the oil. 21:17Yeah. Right. 21:18Yeah. 21:18So exactly. 21:25Well, great. 21:25I'm gonna move us on to our next topic. 21:27Uh, moving us a little bit away from the world of, um, you know, chips and 21:31national competition to something a little bit more kind of close to home. 21:35Two sort of interesting releases that happened fairly recently. 21:37One of them was from OpenAI. 21:39Um, this, uh, benchmark they released called Health Bench, uh, 21:42which is specifically a curated set of about 5,000 conversations. 21:46So the interactions between AI models and users or clinicians, and the idea 21:51is to kind of create standard benchmarks for AI's use in the the health domain. 21:56Um, there's also a really interesting benchmark that came out of IBM 21:58called IT Bench, which is looking at sort of benchmarking on agents. 22:03Um, and I guess Kaoutar maybe I'll turn it to you. 22:05I mean, I think the funny thing that I have when a new model releases now, 22:08whether it be Mistral Medium or what have you, is that they always go like. 22:12They say this like we're very good against all these benchmarks, and 22:15then there's a list of like 50 benchmarks and it's like very difficult 22:19to tell what it actually means. 22:21There's like lots and lots of benchmarks, but it's like very, very 22:23difficult to say, okay, like I'm gonna deploy this in the health space. 22:27You know, is this actually a good model for me? 22:29Um, and I guess I'm kind of curious. 22:30I wanted to kind of put to you the idea that in the future, like these benchmarks 22:35and might end up becoming a lot more fragmented than they are right now. 22:38Right? 22:39Where you might imagine, you know, we say, oh, we're gonna release a 22:41model, but it's gonna be specifically for health applications, and then 22:44here are the benchmarks for it. 22:46Do you think that's where we're headed, or we're gonna kind of keep developing out 22:48this ever more comprehensive benchmark? 22:51Suite, I suppose, for every single model that comes out. 22:54Yeah. 22:54I think this, you know, race, you know, towards, you know, building these AI 22:58models and benchmarking against them. 23:00It's gonna continue. 23:01But now that we're entering the age of agent AI is, you know, the 23:07benchmarking, it still seems like this, you know, it's in the chatbot era. 23:12So if you want, you know, to deploy, like trustworthy, for example, AI agents, I 23:16think we need new evaluations, frameworks that combine general reasoning metrics 23:21with domain tasks, uh, completion. 23:23So. 23:24Think of it like, you know, this general benchmark tests, you know, 23:28the I, the IQ, but you know, sector specific ones, test the job performance. 23:33So I think ultimately the future of benchmarking lies in 23:36these hybrid evaluation stacks. 23:39So general foundations, models, you know, that are tested, uh, you know, 23:42across standard reasoning tasks, but also we need to stress tests. 23:47In realistic operational settings, like, you know, the examples that you mentioned 23:51from OpenAI, the health bench or the IT bench, those are very specific domains 23:55specific, and we need more of that, those, so we need, you know, kind of the 23:59general evaluation stacks and frameworks. 24:02But we also need, you know, the stress tested. 24:05For these realistic, uh, and operational settings. 24:09And those will become, I think, industry standards. 24:11Not because they're broad, but because they're really real. 24:14So, uh, so I think the, the new wave that we're seeing with the, uh, open AI 24:19and, you know, the IBM, uh, you know, the Open AI health bench and the IBM's it. 24:24You know, a uh, IT agent benchmark, you know, these really 24:28are raising critical questions. 24:29Are we really measuring the right things? 24:32And, you know, as I see, you know, these models shifting from static chat 24:35box to dynamic agents, you know, the traditional benchmarks need to change. 24:39Yeah. 24:39What I like about that, and Chris, I'm curious about your comment on this is. 24:43You know, like does the idea of state-of-the-art even make sense anymore? 24:47Like it kind of feels like, well, I don't know. 24:49It just, there's so many use cases now for AI that it'll be very 24:52difficult to imagine that one model is state-of-the-art across all applications. 24:58And so are we kind of maturing our thinking here? 25:00Like does it, I don't know, Chris, do you buy that like state of the art? 25:03It actually doesn't make any sense because of the number of applications. 25:07I don't think state of the art makes any sense. 25:10I think benchmarks and every, it's all marketing hype, isn't it? 25:14We are the greatest at this. 25:16And look, we are the topping this chart here. 25:18Yeah. You need the chart 25:19that shows that you're ahead on all the benchmarks. 25:21That's what you do. 25:21Exactly. 25:22E every, every single, even Mr Out did that. 25:25Right? 25:25Every single model provider releases their chart. 25:28And they beat everybody else, right? 25:30And, and they, and they select the models that they don't want to pit against. 25:35So it's like I'm selecting this, this, this, and this. 25:37Look, I lead on all of this and I win at this benchmark on that one, et cetera. 25:41Because it's gotta look like you've got the greatest model ever. 25:45And, and I understand that, but I truly think, don't let 25:50somebody else tell you, uh. 25:52What model is good for your use case, right? 25:55So if you need a benchmark, go create a benchmark for yourself. 25:59I'm in this domain, these are the sort of things I'm gonna do. 26:02I'm gonna test for it, and I'll create my own evals, and then I'll make sure 26:06that I'm using that model for the the purpose and tasks that I want. 26:10Now, don't get me wrong, benchmarks are kind of useful in some regards 26:14because that allows you to basically go. 26:16Yeah, this model's probably around the right level and therefore I can 26:20go and try it on this different stuff. 26:22So it gives you an idea of whether it's something you're gonna be useful for. 26:26But the reality is most of us know what model is good just from using it. 26:29So I'm, I'm vi I'm a Viber all the way, so, yeah. 26:34Well, but Chris, when you say you create your own benchmark, 26:37isn't that like a biased approach? 26:39I mean, I can craft it in a way that's gonna show. 26:42Uh, you know, my model as the best in whatever I'm doing, so 26:46there might be some bias there. 26:48As opposed to having an external party create maybe a set of benchmarks. 26:53When I say create my own benchmark, I mean for my specific task, right? 26:56So, so if I'm, so if let's say I'm creating a chat bot for tourism, that 27:03answers FAQ questions on flights, right? 27:07You may have the best coding model in the world, but if it 27:11is giving me flights for another company, then it's not really good. 27:14Or if it's not reading the FEQ, then it's not really any good, right? 27:17So. 27:18Like any normal software application, I'm gonna want to create test 27:21cases to say, is this thing coming back with the answers that I'm 27:25expecting for my business use case? 27:27That's the kind of eval. 27:28So it's not about testing for generality, it's actually is the model any good 27:33at the task that I want it to perform. 27:35And, and rather than relying on somebody else to tell you whether it's any good 27:39go, you know, for your application you're building, go, go create some 27:42evals and, and test it out for yourself. 27:44And, and most of the time. 27:47Back to the point, a smaller model that is fine tuned or is uh, you know, 27:53designed for a specific application will do better than a general model. 27:57Anyway, 27:59Wilmar, I'm wondering if you have any predictions on where the kind 28:01of meta of this moves over time? 28:03'cause I, I guess the world that Chris is describing, which I really agree 28:05with, is everybody sees these charts of state-of-the-art performance and 28:09yeah, I think the main thing that I get from them now is, okay, you. 28:12You did your homework, you're at least as good as everyone else. 28:14Right. 28:15It doesn't like, doesn't necessarily make me stand up and be like, wow, 28:18this is the most incredible model ever. 28:19But it just says like you're doing as good as everyone else. 28:22If that's the case, then like we're living in a world where like these 28:25kind of marketing statements don't really have that much value anymore. 28:28And I'm kind of curious how in the future. 28:30An open AI or a Mara or whoever is gonna kind of demonstrate that, like, oh, this 28:34is the model you really should be using in the future, if, if not these benchmarks. 28:37So I, I'm very much aligned with Chris, what he said. 28:41Um, I think there is, there's the outside marketing. 28:44I wanna bring my product to market. 28:46I need to communicate what it does. 28:48And I think what we will see is, and this is what you get with the medical 28:51benchmark, you know, first be that. 28:53How good are you on math tests and physics tests and, you know, logic 28:57tests and now what's happening? 28:59As, as we are, you know, widening the use cases, every industry will come and 29:04say, Hey, you know, I'm the medical guy and I'm, you know, the rocketship guy. 29:08I wanna know how good that model works in the end when you are 29:12putting things, things in production. 29:15You build your own regression tests, because what happens is 29:18that you have something that fails. 29:20It goes into regression test, and then what I'm seeing with projects, 29:24we're running internally here. 29:26Every six months you upgrade your model. 29:28So if you don't build your regression test, you actually don't know what's going 29:32to fail when you're switching from the old model to a new model, and then you have 29:35a problem with the customers behind it. 29:37And so over time, you are building effectively a benchmark. 29:41Call it benchmark, call it ration test, call it unit test. 29:43It doesn't matter, but you're building out your way of validating that a, your fine 29:47tuned model or your, you know, the next version of the, of the open source model 29:52you are using actually works for use case. 29:55And if it doesn't work, you cannot upgrade. 29:57It's if's a quality assurance thing. 30:00And that quality assurance is extremely domain specific. 30:03That's right. 30:03Yeah. 30:03It almost kind of suggests also a world where, yeah, there'll be a lot more 30:06eval work that needs to happen in-house. 30:09And then maybe finally, people have been talking about this for years, 30:11but it maybe actually finally creates a market for like, specialized 30:13businesses that just do evals. 30:15Um, because you live in a world where basically, like your big foundation 30:19model, company's not gonna run every single eval in the whole world, but 30:22if you have a specific use case, you need someone with eval expertise. 30:25You know, it almost kind of feels like that suddenly starts to become viable in 30:28this world as the, the market matures. 30:30But I, I wanna bring it back to Ka Tara's point, and I think it's 30:32really, really relevant, which is we focus on the model world. 30:38But as Kaar was saying, right, it's like in a world of agents, it's gonna be 30:42less about whether this model is capable to this, uh, performing this task. 30:47It's gonna be more of is the agent capable of doing this task 30:51and how good is it performing? 30:53So, so I, I sort of agree with that point, that, that the, the benchmarks 30:58kind of need to move on and sort of really look at it from a kind of. 31:03Perspective as opposed to just always being looking at the models, 31:06which I think will also give a better quality because agents 31:10usually have a much more narrow band. 31:13This is the problem with the, with the generic model, right? 31:15So I have open ai, it can do anything. 31:18And so how do you test that? 31:19It does, what do you want to do specifically? 31:21But once you go to agent, you really narrow the, the 31:25applicability of the model. 31:27The application of the model, uh, because now you're saying you agent, 31:31you do flight booking and nothing else. 31:33And now I can go deep. 31:34Right? 31:35And not just broad. 31:36Right Now we just go so broad. 31:38It's like a little bit of math and a little bit of medical, right? 31:41But now if you have an agent, you can actually say, do, 31:43do your task, or you don't. 31:49I'm gonna move us on to our final story, uh, in the last 31:52few minutes of the episode. 31:53This is kind of a fun one. 31:54You know, on MOE we cover, you know, infrastructure and chips and 31:58health and, you know, model evals. 32:01We don't really talk about showbiz all that much. 32:04Um, but there was this kind of interesting story, um, Amazon was 32:06doing its kind of upfront event where kind of talks to advertisers. 32:10For Prime video, it's kind of upcoming season and there was a bunch of 32:13announcements, new shows they're doing and all the usual kind of show business stuff. 32:17But there was one interesting AI hook that was mentioned in this new story 32:21where Amazon announced that they were gonna start using generative AI to create 32:25contextual advertising on Prime video. 32:28So they didn't really provide too many details, but the idea would be that 32:31you'd be watching a show on Prime video. 32:32An ad would come on, but it would be sort of an ad generated on the fly using ai. 32:38Um, and in a way that's presumably contextually related to both 32:41what you're watching and what it knows about you, um, which is. 32:44A really kind of weird, interesting change. 32:47Uh, we've had targeted advertising for, for many years, of course. 32:50Um, but this seems to be a little bit of a qualitative shift where 32:53like the actual ad itself will be kind of generated, uh, on the fly. 32:58Um, Chris, are you a fan of this? 33:00Are you excited about this? 33:01I am not excited about this 33:03whatsoever. 33:04I mean, come on. 33:05It's like, it's not that loaded question, but yeah. 33:07Chris, go ahead. 33:08I, no, seriously. 33:10I mean, how many times on Amazon, like I look at. 33:14I don't know, maybe I buy a comb off of Amazon and then for the next three weeks 33:20I am seeing combs appear everywhere. 33:23Every website I click on, I buy a comb, everything. 33:27I look on Amazon. 33:28Do you want a comb? 33:29Here's a comb, here's a comb. 33:30I'm like, I just. 33:32Bought a comb. 33:33Don't, don't stop trying to sell me combs. 33:37The last thing I want to do is stick on the tv. 33:40There I am about to watch some New York giants be terrible. 33:44And then guess what? 33:46More combs arrive. 33:47You're like, oh, oh great. 33:50Oh no. 33:51So no, I, I, once you can stop the combs following me around the internet, then 33:57I will be excited about Gen AI adverts. 34:00Okay. 34:01Quick hit Qar. 34:02Are you excited by this? 34:04Yeah. I'm also worried about this. 34:05You know, I think, you know, this contextualized advertising 34:08might be a little too much. 34:10Um, this also hyper personalized advertising. 34:14Uh, I'm not sure, you know, I think we'll have to see how the viewers react to this. 34:18You know, I think we've seen clearly Chris's reaction. 34:22Um, so, and also it's a machine generated content inside the show. 34:27So, yeah, I'm also a bit worried about this, you know, of this 34:31experience and how it's gonna play. 34:32So I feel it's just like they're more getting into our minds and what we see 34:38and you know, so it's a bit creepy. 34:41And last but not least, to close out the episode, Volkmar, your take 34:45on this incredible new feature that Amazon's about to launch into our lives. 34:48So, 34:49um, I. Two startups before I had an ed tech company, so I, I 34:55probably know too much about it. 34:58So there's a. Big challenge of creating really good creatives. 35:03And I think that is already now at a point where AI is really helping, um, you 35:07know, not an internal clicking together or something, but, uh, actually creating, 35:12you know, really good advertising. 35:14Um, it's going to be really interesting to see. 35:17I. Like it's a fact. 35:18They can now do native formats. 35:20So native formats in print is like, when you have, it kind of 35:23looks, it's part of the article. 35:25And so you are reading it despite that it's an advertising. 35:28Uh, and so what you now could do is you could do go full native, so you have the 35:32movie and you could even plug something. 35:34You could change the plot of the movie if you take it to an extreme right. 35:37Um, so it's going to be interesting to see. 35:39I wanna see all the, all the issues they have there. 35:44The model accidentally renders something you shouldn't render. 35:47So how do you do quality control? 35:49But I think this is the path where we are going on. 35:52So, you know, image rendering is obvious, but n video rendering is the next step. 35:57Um, I think it's, uh, hopefully a long, a long path until this becomes reality. 36:03Hopefully. 36:03Yeah, we'll see. 36:04Yeah, I just envisioned that you're like watching Star Wars 36:06in 2030 and like loose Skywalks, like you should really buy a comb. 36:11Exactly. 36:11It was like not a good outcome for sure. 36:13Comb scene. 36:14Yeah. 36:15Everybody remembers that one. 36:16Yeah. 36:16So, um, well that's all the time that we have for today. 36:20Uh, Kaoutar, Volkmar, Chris Pleasures always to have you on the show. 36:23This is one of my favorite. 36:23Kind of panels on MoE. 36:25And uh, thanks to all you listeners for joining us. 36:27Uh, if you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, 36:31and podcast platforms everywhere. 36:32And we'll see you next week on Mixture of Experts.