Learning Library

← Back to Library

AI's Dual Role in Cybersecurity

22m • Unknown Channel • security • news • intermediate • Watch on YouTube ↗

Key Points

The latest IBM “Cost of a Data Breach” report shows the average breach cost climbing to about $4.88 million, but AI‑driven security and automation can shave roughly $2.22 million off that figure, a savings of about half.
Panelists disagreed on the outlook for breach costs in five years, with one predicting they’ll rise and another believing AI will drive them down.
While generative AI tools are delivering substantial cost‑reductions and efficiency gains for security teams, they also introduce new threat vectors that must be managed.
The discussion highlighted a cautious optimism: AI’s promise for cheaper, faster breach response is strong, yet the industry must balance innovation with emerging AI‑related risks.

Sections

Full Transcript

# AI's Dual Role in Cybersecurity **Source:** [https://www.youtube.com/watch?v=L1_cLO4d_zE](https://www.youtube.com/watch?v=L1_cLO4d_zE) **Duration:** 00:22:56 ## Summary - The latest IBM “Cost of a Data Breach” report shows the average breach cost climbing to about $4.88 million, but AI‑driven security and automation can shave roughly $2.22 million off that figure, a savings of about half. - Panelists disagreed on the outlook for breach costs in five years, with one predicting they’ll rise and another believing AI will drive them down. - While generative AI tools are delivering substantial cost‑reductions and efficiency gains for security teams, they also introduce new threat vectors that must be managed. - The discussion highlighted a cautious optimism: AI’s promise for cheaper, faster breach response is strong, yet the industry must balance innovation with emerging AI‑related risks. ## Sections - [00:00:00](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=0s) **AI's Dual Role in Security** - A panel of AI experts debates whether AI will increase or decrease the cost of future data breaches, highlighting emerging tools and new risks. - [00:03:07](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=187s) **Securing AI: Trends & Challenges** - The speakers discuss enthusiasm for AI advancements while emphasizing the need for adversarial protection, auto‑verification, and the growing market demand for AI‑enabled security solutions. - [00:06:11](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=371s) **AI‑Augmented Incident Recap Automation** - The speaker outlines how AI ingests multi‑level security data to automatically produce real‑time summaries and action‑items during lengthy SWAT incident calls, streamlining human coordination and response rather than replacing human defenders. - [00:09:18](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=558s) **Model Unlearning and Data Privacy** - The speaker outlines how synthetic data can protect privacy, introduces “unlearning” as a method to erase specific knowledge from large models, and emphasizes that risk management must span the entire model lifecycle, including rigorous data‑filtering defenses like those employed at IBM. - [00:12:24](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=744s) **Rumor of OpenAI's Strawberry Model** - A host outlines the online buzz surrounding a mysterious, unreleased OpenAI model called “Strawberry,” driven by an anonymous Twitter persona that promises a dramatic leap in reasoning ability but provides no concrete information. - [00:15:29](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=929s) **Beyond Hype: Enterprise LLM Priorities** - The speaker explains that enterprises are shifting from chasing new model releases to managing the surrounding security, licensing, data integration, and workflow challenges of LLM deployments. - [00:18:42](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=1122s) **Evaluating LLM Progress Beyond Benchmarks** - The speakers debate whether improvements in large language models reflect genuine intelligence or just benchmark tuning, and outline their comprehensive client‑centric evaluation framework that consistently shows quality gains with newer models such as GPT‑4. - [00:21:47](https://www.youtube.com/watch?v=L1_cLO4d_zE&t=1307s) **Limits of Plug‑and‑Play Model Swaps** - The speakers discuss how simply replacing a language model isn’t enough for better performance, requiring adaptation of surrounding components, and emphasize the need for better metrics beyond MMLU to evaluate large models across diverse use cases. ## Full Transcript

0:00Is AI going to save computer security? 0:03I think there's a balance. 0:04So while new tools are helping a lot, then on the other side, we are also 0:10seeing new risks that arise with AI. 0:12There is no evidence that Strawberry is anything at all. 0:15OpenAI does need something that is significantly better 0:19than where they are right now. 0:20So I do believe that they have to release something mega pretty soon. 0:30I'm Tim Hwang, and I'm joined today as I am every Friday by a tremendous panel 0:34of researchers, engineers, and others to hash out the week's news in AI. 0:38Today, Nathalie Baracaldo, who's a senior research scientist and master inventor, 0:43Kate Soule, who's a program director in generative AI research, and Shobhit 0:46Varshney, senior partner consulting on AI for US, Canada, and Latin America. 0:56So before we get into this segment, I want to do our usual around the horn question. 1:00Um, and I think it's a really simple one, but I think teases up really 1:03well to kind of get into this topic. 1:05And the question simply is, um, data breaches are very expensive today. 1:09Do we think in about five years that the costs of an average data 1:13breach will be going up or down? 1:15Will it be greater than or lesser than the kind of damage that we see nowadays? 1:19Um, Shobhit? 1:20More. 1:22Uh, Kate, how about you? 1:24I think down. 1:26All right, great. 1:28Going down. 1:29Okay, great. 1:30Well, we just got some disagreements, so let's get into this segment. 1:32So we've got a couple of news stories that we really want to focus on today. 1:36First one is actually a story that comes right out of IBM. 1:39Um, IBM released basically a few weeks back a report called Cost of a Data 1:44Breach, which is the latest edition of an annual report they do, estimating 1:49the the costs of data breaches. 1:51Um, and it has some fascinating implications for AI and cyber security. 1:56Um, right now it estimates that the average cost of a data breach 1:58is rising, um, 10 percent increase over last year, where the average 2:02data breach costs is about 4. 2:0488 million. 2:05But I think one of the most interesting things is that it 2:08estimates that there's an average 2. 2:1022 million costs savings in the use of security, AI and automation. 2:15So that's, that's a huge, crazy, crazy difference. 2:18I want to kind of get into the discussion with, uh, Nathalie to 2:21bring you in first is that's like a 50 percent difference, right? 2:24And I'm kind of curious how you think about sort of the use of AI 2:27in the security space and how these kind of two worlds intersect and 2:31the world, uh, the implications I think for AI in the security space. 2:36Thank you, Tim. 2:36So, um, actually, I read the report and I'm very, very happy to see that Gen AI 2:42and like AI in general really reduce the cost of, uh, incidents and help a lot. 2:48The teams are really involving the security. 2:52I think there's a balance. 2:53So while new tools are helping a lot, Then idea. 2:57On the other side, we are also see new risk that arise with the eye. 3:02Now, uh, the amount of benefits that we have with these new tools. 3:07It's fantastic. 3:08So I'm very, very excited that we're heading in the right direction, 3:12but we cannot forget that we do need to protect those tools against 3:16adversarial attacks and throughout their the pipeline of the system. 3:20So overall, I'm very excited to see the entire communities 3:24heading in the right direction. 3:25Definitely including AI for, uh, auto verification and, and helping humans. 3:32It's really helping out. 3:34And uh, so yeah, that's, uh, that's my thoughts. 3:37Yeah, for sure. 3:38That's really helpful. 3:39And Shobhit, I'm thinking when you talk to clients, you know, you work 3:42with clients on a wide range of ai, different implementations and you 3:46know, the security space is something we actually really haven't covered 3:48very much on this show before. 3:50Um, and I'm kind of curious in the market, do you see more and 3:53more enterprises wanting this? 3:54thinking about this intersection, um, and I guess if there are particular use 3:57cases that come to mind where you're like, wow, that's, that's really making 4:00the difference, I think, in, in reducing the impact of data breaches, preventing 4:03data breaches in the first place. 4:05Um, just curious about what you're seeing out there in the market. 4:07Yeah, absolutely. 4:08So a very, very hot topic for all of our clients, and it's a two way street. 4:12There is. 4:13AI that's helping you drive better security. 4:16So pattern recognition and things of that nature to secure things. 4:19But there's also the reverse where the security teams are doing a 4:22better job at protecting AI as well. 4:24So it's both directions. 4:25We are learning quite a bit. 4:26So we've gotten much closer to our security services 4:29within, uh, consulting as well. 4:31There are a few things that you do in security. 4:33There is prevention. 4:34There is a making sure that you're being detected fast enough, you're 4:38investigating what happened, and you're being able to respond, right? 4:40The whole life cycle of it. 4:42So across the whole platform, if you look at what, from a tooling perspective, 4:46you're doing things like what's the attack surface, how do you manage that? 4:49How do you do red teaming around it? 4:51How do you do the do posture management, things of that nature, right? 4:54So there's quite a few areas where Gen AI has been, or AI has been able to 4:58make a meaningful difference to it. 5:00The report that we're talking about, that's a, that's a massive study. 5:04I'm just to give you the scale at which we did this, there are about 5:08600 plus organizations that had data breaches in the last year. 5:1217 industries. 5:15We interviewed, this team interviewed about, um, Close to 4, 000, uh, people, 5:20senior security officials who dealt with the security breaches and stuff. 5:23And we looked at the entire spectrum of where AI is getting 5:26involved, is being applied, right? 5:28So when you start to look for patterns or looking at how do I do training, so 5:32the number one reason, number one was human error or human training that's 5:37needed to prevent these from happening. 5:39So small things like social engineering. 5:42I can use generative AI model to create a very, very plausible email 5:46that will be very tempted to click. 5:48So that click baitedness of how we generate content has been applied 5:52to social engineering attacks. 5:53Right, like using it for red teaming is kind of what you're 5:55talking about now, right? 5:56It's like, yeah, right. 5:57So red teaming, great use case. 5:59The second one, I'm working with a large Latin American bank. 6:01We're working on cybersecurity, uh, uh, pattern detection. 6:06So we're saying, here's a set of things that happen. 6:09Can you, can you create an early alert? 6:11based on the pattern that you're seeing. 6:13And then the same information needs to be assimilated at different levels and 6:16being able to send out as alerts, right? 6:19So we're being able to automate parts of what a human would have otherwise 6:22done in managing the whole life cycle from detection, education to 6:26detection, to managing the thing, right? 6:28On these SWAT calls, you join a SWAT call and it's been 6:31running for the last six hours. 6:33And executives will jump in and say, Hey, can somebody recap? 6:36Right? 6:37That's a very easy one for us. 6:38So now we've started to generate recaps of what has happened so far. 6:42Actions that people have committed to taking. 6:44So those things show up on the right side. 6:46Anybody who joins the SWOT call knows exactly where we are 6:48with trying to Get a sense of. 6:50That's really cool. 6:50Yeah. I never really thought about that. 6:52Yeah. 6:52I think that's kind of the funny thing is like when you think about 6:54like AI and security or like, Oh, there's a, you know, hyper intelligent 6:58machine, you know, uh, system that will just defend against hackers. 7:02But I think what's really interesting is like a show, but a lot of what 7:04you're talking about is just like, how do we optimize like the human team 7:07that's doing a lot of this, which I think is really, really important. 7:09Um, okay. 7:11Maybe a final question for you to kind of bring you into, and I'd love 7:13to kind of get the, the researchers sort of view on some of this is. 7:17You know, Shobhit talked about a big piece of this is defending AI systems, uh, 7:22against kind of subversion or manipulation or attack, which is a huge issue, right? 7:26I mean, you know, I was joking with a friend recently. 7:28I was like, there's probably a whole product you could build that's just 7:30around kind of manipulating open, you know, chatbots that people have on 7:34people's websites and that kind of thing. 7:36Um, and I guess, I don't know if you want to give our listeners a 7:39sense of like the kind of like, sort of like state of affairs there. 7:42Um, because it feels like, I mean, there's certain things that just 7:45seem like very hard to defend, right? 7:46Like it's like within a few minutes of any model coming out, 7:49people have already extracted the prompt and the system prompt out. 7:51Like that's like just something that's like hard to control. 7:54Um, and so, yeah, I guess on the technical side from this kind of perspective of 7:57defending AI systems, curious if you have any thoughts or hot takes on sort 8:01of like where we are there and if the kind of state of the art is getting to 8:04the point where we feel like, yeah, we can actually kind of handle some of these 8:07attacks when we these systems to the wild. 8:10Yeah, well, I want to make sure we give Nathalie a chance to jump in there 8:13because Nathalie, I know you're doing some really exciting work specifically 8:16in that space, so it'd be great to to get your perspective as well. 8:20You know, I think my where I've seen some really interesting 8:24research that we haven't. 8:26Quite touched on yet is actually on the data itself. 8:30So not that necessarily the life cycle, but imbuing the data 8:33itself with different protection. 8:34So if it is leaked, maybe it's not as big a deal, right? 8:37So there's some interesting work going on that we've done, for example, with some 8:41different financial institutions looking at, can we create versions of the data? 8:47That are privacy protected where we actually create a synthetic 8:50version of a, you know, a customer bank transaction records. 8:55We extract and remove all PII. 8:57We try and make it, you know, so that you could never identify the 9:01individual and we use that data set. 9:03to now go out into the business and drive decisions and, you know, have a 9:07much broader reach across organizations. 9:09And that way, if that information is leaked, sure, there's, you know, maybe 9:13some business knowledge that's leaked, but there's not actual customer information 9:17that's leaked to the same degree. 9:18So there's a whole area of research around kind of synthetic 9:21data and making that decision. 9:23data, um, private that I think is going to be really powerful as a tool. 9:27But Nathalie, you know, what are, what are your thoughts? 9:29You're, you're so ingrained in this space, really eager to get your perspective. 9:32Yeah. 9:33Uh, this, this question, I really like it because it really touches upon 9:37the entire life cycle of the model. 9:39In my perspective, risk is throughout the system. 9:43And right now I'm working on something that it's really, really, uh, interesting. 9:47And it's the concept of unlearning. 9:50And, uh, a lot of people find it interesting that it's not learning. 9:54Uh, but actually we're removing knowledge from a model. 9:58So let me, it's like, we're all 9:59about machine learning. 10:00You're like doing the opposite. 10:01it basically. 10:02Yeah. 10:03And if you watch a Star Trek, there's this, uh, Yoda saying, you always need 10:08to unlearn or something like that. 10:09It's because actually sometimes we touch upon certain topics that later 10:13on we'd really want to get rid of. 10:16And the reality is that when we have a machine learning model, the way that 10:20we arrive to these very large models is by feeding lots and lots of data. 10:25So one of the things as Kate was mentioning is really trying to 10:28mitigate what data goes into the model. 10:32However, because the data is so huge, it is really, really difficult to 10:36make sure that you filter everything. 10:39So at some points in time, even after we apply defenses like we're doing 10:44here at IBM, we filter, then we try to align the model and everything. 10:49At some point, we may realize that the model is spilling out data that's bad. 10:55And this is going to happen just like in any security, uh, kind 10:58of, uh, area, we are going to see things that happen way after. 11:04Now, what do we do? 11:05We have two options. 11:06Option number one is cry. 11:08No, I'm kidding. 11:10Option number one is actually retrain the model, uh, which is not going 11:15to break the problem because Think about how long it takes to, to train 11:20these models and how costly it is. 11:22So the idea of unlearning is rather than retraining, can we create a 11:27way so that we manipulate the model and forget all the information? 11:32in retrospective. 11:34And that is one of the things that really, uh, has got me really excited to work 11:38on, uh, because it's a new angle towards security and it's not only security, it's 11:43also life cycle management of the model. 11:46And that is a very, very, very, I think it's going to be the future. 11:51And, uh, Tim, you were asking the first question about how do I see the future? 11:56I'd see having not only guardrails and not only filtering, but also 12:01having this way of going back to the model, modifying the model, and 12:06then make it better for everybody. 12:09And we don't need to foresee every single thing that will 12:12go wrong if we can do this. 12:14So that's, uh, uh, one of the things that I think it's, uh, very trendy. 12:18Nobody knows how to fully solve it, but we're there. 12:21And, uh, It's getting me really excited. 12:24That's so cool, yeah. 12:24I mean, you hear it here first, listeners. 12:26Uh, unlearning is the new hotness in machine learning, so. 12:29I call it the new black. 12:37So this week, and late last week, rumors are swirling around 12:42a thing called Strawberry. 12:45Uh, and if you are too terminally online like me, um, there's a large 12:49amount of discourse, uh, about this potential model that OpenAI is going 12:53to release, which is going to be, uh, promises a substantial increase in 12:57capabilities and reasoning ability. 12:59Uh, everybody's saying that it might be the model that fits. 13:01finally brings the company into level two in their internal technology 13:04tiering, which is models that have much more powerful reasoning capabilities. 13:08Um, this is a really bizarre story in some ways because open AI has 13:12not disclosed anything publicly. 13:14Um, and in fact, most of the discussion online is being led by this completely 13:18weird anonymous account that showed up a few weeks ago, um, that goes by 13:21the handle, I rule the world Moe, um, which is this weird account that the 13:26Twitter algorithm just appears to love. right? 13:28Basically, it's just promoted into everybody's feeds all the time. 13:32And it promises that today, actually the day of recording is going to 13:35be the day where we're going to see this godlike model emerge. 13:39And now this, this account has promised a lot. 13:41A lot of people have called it out for basically just not actually 13:44providing any real detail and just kind of adding to the AI hype. 13:48Um, and so I think there's two questions I want to cover here, but 13:50maybe let's just do the first one, which is, this is just hype, right? 13:54We have like no reason to believe that open AI is going to release. 13:58anything at all, um, and I guess I don't know which of you have kind 14:01of been watching this, this story. 14:03Maybe I'll start with Shobhit, but like, Shobhit, like, this 14:05is, this is just hype, right? 14:06Like, we have no reason to believe that anything is about to happen today. 14:09Yeah, so 14:09there's, there are, he, he earlier said it was coming out Tuesday at 10 p.t., right? 14:16So he's been, you know, like moving it around as well. 14:18All kinds of conspiracy theories, whether this particular Twitter account is 14:22just a shadow account for Sam Altman to just build some excitement and whatnot. 14:26There's just so 14:26much fan fiction in the space. 14:28I can't deal with it. 14:28I'm just like, I'm just trying to do machine learning here. 14:31So I think just, uh, overall the arch of the reasoning 14:34capabilities, uh, is improving. 14:37It's not anywhere close to human, but it is starting. 14:40The models are starting to get better. 14:41I'm very encouraged by how enterprise friendly features are being added. 14:45Uh, things like function calling or structured outputs, things around, 14:50uh, observability and so forth. Right. 14:52So I think we're all moving towards the right direction. 14:55OpenAI does need Uh, something that is significantly better 14:59than where they are right now. 15:00They have enough competitors that nibbling, uh, on the, on all the 15:04benchmarks and so on and so forth. 15:05So I do believe that they, they have to release something mega pretty soon. 15:10Uh, Strawberry, all the rumors that I've heard so far, it's very encouraging. 15:14Uh, we've never seen any benchmarks around it yet. 15:16The models that were showing up on LIMPSYS and others in shadow mode and stuff, 15:21those are revealed to be the new 4.0 model and so forth. 15:24But you've still not seen any actual validation that these 15:28models are going to be any better. 15:30Seeing that iPhone is going to, Apple is going to come up with the next best 15:33iPhone, of course that's going to happen. 15:34It's just a very obvious thing. 15:36I like that, yeah, like a prediction is like 15:38OpenAI is going to release something big at some point. 15:41Yeah. It's like, yeah, I guess that makes sense. 15:43And Tim, our clients, at least from an enterprise perspective, we're no longer 15:47jumping up and down with the latest releases of models and stuff, right? 15:50Now you're at a point where, From an enterprise value perspective, right? 15:54There's so much to be done before and after the LLM call, there's so many other 15:58things that non functional in nature. 16:00If my data is on a particular cloud, the security IP, what's 16:03the licensing agreement I have on? 16:05Can I actually commercially use this model? 16:08How? 16:08How have I adapted that model to my own data? 16:11So on so forth and there's just so many millions of things that happened before 16:14and after earlier that has been my team's focus on creating the end to end workflows 16:19with the right evaluations and so on so forth for the business value unlock 16:22and the model itself we keep swapping that out on a fairly regular basis so 16:26our clients are not at a point where, oh my god, this beat the benchmark by 0. 16:301. 16:31They're not like texting you being like, what's up with Strawberry? 16:33Can I, can I get Strawberry? 16:37I actually, I do want to also kind of like, so that's very interesting 16:39on the business side, right? 16:40Because there's so much hype about on social media, sort of interesting 16:43on like the really day to day, like getting the business done kind of angle, 16:46like clients are not asking about it. 16:48Um, Kate, Nathalie, I would love to kind of bring you into this kind of 16:52on the research side as well, right? 16:54Like having worked with a lot of researchers in my time, what's kind 16:57of interesting is that a lot of this kind of Twitter hype doesn't 16:59really impact the day to day. 17:00Like a lot of people are like, Oh yeah, I know about it, but I'm 17:02not really paying attention to it. 17:03Is that your sense of it? 17:04Like there's kind of this like weird universe of discourse, which is 17:08about AI, but it's like not people who are actually doing the research. 17:11I curious about how you, if you're a Strawberry believer, a, but just how 17:14you view this whole weird new cycle, I guess that we're in this week. 17:17Okay. Thanks. 17:17I mean, I haven't been paying too much attention to it. 17:20You know, it's a waste of 17:21time. 17:22Yeah, we got more interesting problems to solve than figuring 17:26out the meaning behind Strawberry. 17:28But I don't know, Nathalie, what are your thoughts? 17:30Yeah, uh, the first thing that I thought I was very, very curious about Project 17:35Q, which seems to be same as Project Strawberry, uh, but being really day 17:41to day working with these models. 17:43The thing that I first thought is like, okay, now they are saying we 17:47are moving to the next level of AI when we cannot really fully measure 17:52the performance of the current chat based model, a level where we are. 17:57So I meet it with a skepticism in that, uh, it may be. 18:02great answer certain questions and in certain scenarios. 18:06But when you dig deeper and try to change a little bit the context, it 18:10may be possible that it's not working. 18:12And the reason is that right now we really are not very good at measuring 18:17the performance of the models. 18:18There's tons of benchmarks out there. 18:20Uh, but if you throw the model to the wild, then you'll see 18:24stuff that is slightly different. 18:26So I meet it with a skepticism, really, I'm pretty sure it's going to be great. 18:32Uh, the other thing that I was thinking is that how do you know what is 18:37behind and the fact that it's closed doors makes me wonder, what is it? 18:42Is it really intelligence or are there like rules on top of a model? 18:46And, and maybe it is really, really tailored to this solution and the 18:50benchmarks that they are trying to beat. 18:52So we'll, we'll see. 18:53But that's, uh, my, my take on that. 18:56That's right. 18:56And it's a very interesting outcome, which is like, you know, OpenAI 18:59drops like the new big model. 19:01Um, but like because our evals are kind of so crude for evaluating model 19:06capability, it's actually kind of unclear how much of an improvement it is. 19:10Like I think that's actually also really kind of potentially 19:12funny and interesting outcome. 19:13Yeah. I push 19:13back a bit on that, Tim. 19:14Okay. 19:15You think it'll be obvious? 19:16Like when they take action, it's going to be. 19:17Yeah. And it's very transparent. 19:19Uh, like we do this every day with our clients, right? 19:22So we'll go in and say, Hey. 19:23Everybody has some sort of a knowledge search use case and 19:26rack patterns and so forth, right? 19:28So we have our own, our entire benchmarks. 19:30We create golden records, truth, grounding of truth and stuff. 19:34And we compare against those. 19:35We'll do a human evaluation. 19:37We will do an LLM as an, as a judge, whatnot, right? 19:39So we'll do this whole entire rubrics. 19:41for clients. 19:42We see a meaningful difference when you're applying an OpenAI GPT 4. 19:460 model versus a smaller model. 19:49We do see a better response. 19:51It's crisper. 19:52We do see quality improvements over the last, uh, 18 months to two years, right? 19:57So like I'm generally I'm very impressed with how well the models 20:01work, as long as you do the before and after ridiculously well, right? 20:05If you form the question in the right way, and you're asking it, and you're 20:08getting the data, the answers are getting better with these model upgrades. 20:12I still don't think that the smallest model can come close to 20:15what the OpenAI models are doing. 20:17There are some bespoke use cases like Cobalt to Java, right? 20:20Of course, IBM's model has to outperform a general model because we have all 20:25of this first party data, we have a ridiculously good set of talent around 20:29it, research, IBM tech can create that model and fine tune it really well. 20:34So those use cases, obviously it's not even a competition. 20:36But if you're looking at knowledge article use cases, can I understand the nuances 20:40of what happened on this IT ticket? 20:42The ticket itself is 15 people have touched it. 20:45And each one had different updates. 20:46What's the root cause of what happened? 20:48The bigger, nicer models have better reasoning capabilities, do 20:51an exceptionally good job at picking out the needle in the haystack, which 20:55smaller models cannot, can't get to. 20:57But 20:57Shobhit, do you think we're at the point where like, I can translate a 0.01 increase in MMLU or like the degrees of which, you know, 21:05we're starting to see these model incremental changes are so small. 21:08into like, this will improve my accuracy and then reduce my cost by x. 21:12So I do see, uh, different weight classes, right? 21:15If you're just still in the Olympics frame of mind right 21:18now, different weight classes. 21:20If you're in the, in the, in the top league of frontier models, 21:23you will not see that much of a difference because there are other 21:25techniques that you're using that have a higher impact on it, whereas 21:29just swapping out the model itself. 21:30But the same use cases, if I go from Gemini to OpenAI to Claude, I 21:34do see meaningful changes in the way they're interpreting the data and 21:37how they're responding to it, right? 21:39But then once you pick a model, then the way you're asking the question, 21:42the way you've created embeddings and things of that nature, you have 21:45to tie it a little bit to the model. 21:47You can't just swap out that, that model for the new one and 21:49expect it to behave better. 21:50So it's, it's just not a very plug and play right now. 21:53But if you find a model. 21:55You adapt the rest of the before and after to it. 21:57You see a fairly decent quality bump, but again, different weight classes 22:01will give you different results. 22:02Yeah, yeah. 22:03So I think, uh, hearing show it, one of the things I thought is totally 22:07agree with you in that large language models have improved substantially 22:12the performance of smaller models. 22:14Uh, the comment was really towards more how do we measure those big models, those 22:19large language models, and I think, uh, we still some to have some more research to 22:25measure a nicely what's their performance. 22:28And I agree with Kate, uh, definitely higher MMLU does not guarantee 22:33that the model is going to perform, uh, great in certain use cases. 22:38So yeah, lots of interesting challenges to, to address there. 22:41We are unfortunately at time. 22:44Um, so Nathalie, uh, Kate, Shobhit, thank you for joining us as always. 22:48Um, and for all you listeners, if you enjoyed what you heard, you can get 22:51us on Apple Podcasts, Spotify, and better podcast platforms everywhere. 22:55Uh, we'll see you next week.