Learning Library

← Back to Library

O1 Preview Sparks Chain‑of‑Thought Upgrade

Key Points

  • Agents‑as‑a‑service and multi‑agent teams are expected to become ubiquitous, driving a major shift toward collaborative AI workflows.
  • The panel debated the O1 preview’s hype, with Chris eager for new models, Aaron noting the scientific intrigue of chain‑of‑thought learning, and Nathalie highlighting tangible security‑metric improvements.
  • The newly released model embeds chain‑of‑thought reasoning and reinforcement‑learning techniques directly into its architecture, boosting its reasoning performance.
  • An unusual production schedule meant the episode was recorded before the model’s public launch, illustrating the fast‑paced timing of AI releases.
  • The discussion framed large language models as the key enabler for personalized experiences, coining the idea that “an active league is a happy league.”

Sections

Full Transcript

# O1 Preview Sparks Chain‑of‑Thought Upgrade **Source:** [https://www.youtube.com/watch?v=KPsl7IK2_eo](https://www.youtube.com/watch?v=KPsl7IK2_eo) **Duration:** 00:47:56 ## Summary - Agents‑as‑a‑service and multi‑agent teams are expected to become ubiquitous, driving a major shift toward collaborative AI workflows. - The panel debated the O1 preview’s hype, with Chris eager for new models, Aaron noting the scientific intrigue of chain‑of‑thought learning, and Nathalie highlighting tangible security‑metric improvements. - The newly released model embeds chain‑of‑thought reasoning and reinforcement‑learning techniques directly into its architecture, boosting its reasoning performance. - An unusual production schedule meant the episode was recorded before the model’s public launch, illustrating the fast‑paced timing of AI releases. - The discussion framed large language models as the key enabler for personalized experiences, coining the idea that “an active league is a happy league.” ## Sections - [00:00:00](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=0s) **Panel Debates o1 Preview Hype** - A mixed‑expert panel examines whether the new o1 model delivers on its promises, while also discussing agents‑as‑a‑service, multi‑agent collaboration, and AI‑driven personalization. - [00:03:05](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=185s) **Chain‑of‑Thought Self‑Education in LLMs** - The speaker explains how chain‑of‑thought prompting combined with reinforcement learning lets large language models introspect, iteratively learn from varied problems, and achieve aligned answers without explicit ground‑truth supervision. - [00:06:09](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=369s) **Prioritizing Reasoning Over Raw Answers** - The speaker stresses that accurate reasoning—such as step‑by‑step validation for calculations, puzzles, or chain‑of‑thought tasks—is more important than merely predicting the next token, and suggests using reinforcement learning rewards and inference‑time tree searches to train models toward proper logical processes. - [00:09:18](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=558s) **Comprehensive Safety Evaluation of LLM** - The speaker explains how a single model’s safety is assessed through diverse metrics—including jailbreaking, hallucinations, and fairness—while leveraging advanced benchmarks and introspection methods to capture a holistic view of model behavior. - [00:12:21](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=741s) **Cascading Errors in Model Reasoning** - The speaker discusses model risk levels, how chain‑of‑thought mistakes can propagate and become harder to detect than hallucinations, and references ongoing work on consistency and security. - [00:15:25](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=925s) **Balancing Model Speed and Superintelligence** - The speaker highlights the growing division between cheap, low‑latency models and larger, costly ones while questioning when AI benchmarks that surpass PhD‑level performance will translate into self‑reinforcing, superintelligent systems. - [00:18:36](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=1116s) **From SaaS Pioneers to AI Threat** - The speaker highlights Salesforce’s early role in popularizing SaaS and the resulting industry‑wide SaaS disruption, then questions whether AI‑driven agents will similarly upend every product category. - [00:21:45](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=1305s) **Emerging AI Agent Marketplace** - The speaker foresees a worldwide platform where users purchase AI-driven tasks from agents—similar to Fiverr—driven by firms that possess superior, faster, and cheaper multimodal data, prompting big tech players like Salesforce to enter the space. - [00:24:58](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=1498s) **Narrow-Agent Design with RAG & Unlearning** - The speaker suggests replacing broad, human‑centric language models with specialized, domain‑focused agents that leverage retrieval‑augmented generation and machine‑unlearning to selectively add or erase knowledge, enabling fine‑tuned, objective‑driven functionality, and highlights an IBM‑Salesforce partnership to advance this strategy. - [00:28:06](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=1686s) **Balancing Deterministic and Exploratory AI** - The speaker explains how to configure pipelines that decide when to use stochastic exploration versus deterministic retrieval (e.g., RAG) to maintain trustworthiness, combine human oversight with technical safeguards, and apply this approach in business settings before shifting to a discussion on fantasy football. - [00:31:08](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=1868s) **Massive Fantasy Sports Platform Metrics** - The speaker outlines their eight‑year‑old consumer‑facing fantasy sports service, highlighting 12 million registered users, billions of page views and insights, 5,000 requests per second, and its predictive injury‑detection and trade‑analysis capabilities. - [00:34:13](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=2053s) **Generative AI Unlocks Scalable Personalization** - The speaker explains how generative AI can break the bottleneck of producing countless personalized variants by leveraging comprehensive customer data platforms to automate role‑play‑style content customization. - [00:37:19](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=2239s) **Personalizing Infinite Scoring Sentences** - The speaker explains how they merge edge‑generated fill‑in‑the‑blank templates with percentile‑based adjectives to tailor AI‑generated messages for countless scoring rules, creating a “theory‑of‑mind” personalization layer that delights users. - [00:40:24](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=2424s) **Model Unlearning for Poisoning & Hallucinations** - The speaker explains how unlearning methods can patch AI models to excise poisoned data, copyrighted material, and persistent hallucinations without the need for costly full retraining. - [00:43:33](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=2613s) **Future of Surgical LLM Fine‑Tuning** - The speaker envisions LLM development shifting toward precise, “surgical” interventions—targeted activation control, unlearning, and advanced visualization—to selectively edit and fine‑tune model behavior. - [00:46:36](https://www.youtube.com/watch?v=KPsl7IK2_eo&t=2796s) **Balancing Forgetting in Image Models** - The speakers explain how loss functions and mixture‑of‑experts architectures can be optimized to control what image‑to‑image generation models remember or discard, concluding with a podcast sign‑off. ## Full Transcript
0:00Can models officially reason now? 0:02They have risk levels for models. 0:04I think, uh, we're still good. 0:05So no terminators inside. 0:07Are agents as a service, the 0:08new software as a service? 0:10Agents are going to be everywhere. 0:12And multi agents, uh, you know, 0:14operating in teams and crews, multi 0:16agent collaboration is going to be huge. 0:18Are LLMs the true unlock for personalization? 0:21We came up with the notion that an 0:22active league is a happy league. 0:30I'm Bryan Casey and I'm joined this week 0:31by a world class panel of experts across 0:34engineering, research, and product, and we're 0:36excited to get into this week's news in AI. 0:38This week we have Nathalie Baracaldo, 0:41Senior Research Scientist, Master Inventor. 0:43Aaron Baughman, IBM Fellow, Master 0:46Inventor, and Chris Hay, Distinguished 0:48Engineer, CTO, Customer Transformation. 0:56Alright, so, as every week, we start 0:58out with a quick, hot take question. 1:01And this week's question is, 1:02Was o1 Preview worth the hype? 1:04We'll start with you, Chris. 1:06I live for the hype, and 1:08I wait for the next model. 1:09You know, where's my new model? 1:11It's been a week already, so it makes sense. 1:13Um, Aaron, what about you? 1:14Um, yeah, I think scientifically this 1:16whole chain of thought, allowing systems 1:17to teach itself, is very interesting. 1:20Um, but I need to wait and see how 1:21it works out in the implementation 1:23details in the application space. 1:25And Nathalie, 1:26I think the new model, it's really 1:28interesting from the security perspective, 1:30some of the metrics that they show 1:32really demonstrate improvement. 1:34So I'm very excited about it. 1:36All right, well, let's jump right into it. 1:38And actually, that's going to 1:39be our first topic this week. 1:40So it was funny. 1:41So this model released last Thursday, 1:45actually, and there's a little inside 1:46baseball for our listeners on the 1:48show, we should record this show. 1:50On thursdays every week, um, and then we 1:53go into production and we release the show 1:55friday morning This one week we didn't do 1:58that And we actually recorded the show on 1:59wednesday and then of course the model came 2:01out on thursday so that's just the way of the 2:03world but Um, this was an announcement that 2:06has been hyped for a long time for anybody 2:09who's on Uh, you know twitter or x You've 2:12seen the memes around strawberry for what 2:16feels like an eternity, um, at, at this point. 2:19Um, and then it's finally here. 2:20It happened. 2:21The model arrived, um, and it wasn't, 2:23you know, just released as a blog post. 2:25It was actually rolled out, um, to, to the 2:28broad user base within their, uh, within chat 2:30GPT and I think some of the API as, as well. 2:34Um, and the interesting thing about 2:35this model is that it introduced chain 2:38of thought and reinforcement learning 2:40techniques Um into the model itself. 2:42So not as a way to interact with the model 2:44but as a way that is an embedded capability 2:46inside of the model, which is um Definitely 2:49kind of a new approach, um there and 2:51we've seen pretty important and noteworthy 2:55improvements in reasoning capability 2:57as a result from that and so maybe just 2:59because Aaron because you You Um, touched 3:02on this specifically in your first answer. 3:05Um, I actually just want to start with the The 3:08interesting kind of science around including 3:11chain of thought and reinforcement learning 3:13techniques within the model itself And just like 3:15get your reaction to that and why it's important 3:18the things that are exciting to you about it And 3:20maybe you know, you raised a couple questions 3:22Um, even in your initial answer, but like some 3:24of the things that you're still waiting to see 3:26Yeah, I think it's really fascinating 3:28how, you know, chain of thought it was 3:30really introduced in 2022, and it's just 3:32accelerated and expanded to become the self 3:35education for large language models by 2023. 3:38And here we are today with strawberry, you know, 3:40so it's great how fast technology is moving. 3:43And what I really like about this is that 3:47These chain of thoughts, it helps us to 3:49introspect the mind of a jet generative A. 3:52I. System. 3:54And what happens is, is that, you 3:56know, you might seed a system with 3:58problems and answers so that you know 4:00whether the chain of thought helped. 4:02to induce the right answer. 4:04And then later, um, you can keep iterating 4:07over and over and over with these chains 4:09of thoughts, with newer variations of the 4:11problem, so it learns new skills over time. 4:14And you create variations of these problems, and 4:17you have almost like a panel of these generative 4:20AIs, um, answering with different strategies. 4:24And if all of the answers align towards, 4:27um, you know, that's the same answer, 4:29even though you don't have a ground 4:30truth, then more than likely the chain of 4:32thought is working because it's converging 4:34towards, uh, a less variant type answer. 4:38And through all of this looping, we have 4:40these gradient updates so that it can 4:41learn, you know, uh, more and more, you 4:44know, so we have gradient updating and 4:45we have in context learning put together. 4:49But, um, the last thing I'll mention 4:50is that what I really liked is how 4:52they broke out reinforcement learning 4:54as what they call train time compute. 4:56Um, and then they have this 4:57thinking time as test time compute. 5:00You know, so the thinking time is kind 5:01of when it iterates, and it's passing 5:03these many chains of thoughts, you know, 5:05along, you know, and then, um, the train 5:09time is where it's doing this in context 5:11learning and perhaps doing, you know, some, 5:13uh, you know, fine tuning, um, in there. 5:16That's great. 5:16And Chris, I know you in particular have 5:18been, you know, I've been following you 5:21following this, um, on Twitter as well. 5:23So, you know, maybe give you some space to 5:25have like a more open ended kind of reaction, 5:28um, to, to just the release of this and, you 5:31know, the extent to which like it, it was, it 5:33wasn't what you were expecting it to be, but 5:35would love to just get your, um, your thoughts. 5:38I thought it was super interesting. 5:40I mean, just to sort of go on sort 5:42of some of Aaron's points there. 5:44The reinforcement learning 5:46part is really interesting. 5:47So if you think of the chain of thoughts for 5:49a second, then if you're solving something 5:52like a puzzle, so maybe you want to Get it 5:54to solve a Sudoku grid, or maybe you want 5:56it to, uh, calculate, you know, how, which 6:00book, if Phileas Fogg was listening to the 6:03Harry Potter books, and, uh, what book would 6:07he be on by the time he got to India, right? 6:09If you think of those sort 6:10of questions, then, yeah. 6:12The answer isn't actually the important 6:14part, of course you want an accurate answer. 6:16The big thing you really want to do 6:18is, is reason in the correct way. 6:20And the sort of things that you want the model 6:21to be able to do is, sort of calculate the 6:24distance to, uh, India, for example, calculate 6:28the sleep time of Phileas Fogg, how long the 6:30Harry Potter books are, and then be able to 6:33sort of validate those steps all the way across. 6:36And, and that would be similar for something 6:37like validating a Sudoku puzzle, right? 6:39Okay. 6:39You want to check the horizontals, the 6:41verticals, the sub mini grids, etc. 6:44Check each individual number and 6:46then see if there's any duplications. 6:48That logic is more important than just 6:51trying to predict what your next token is. 6:53So if you think of reinforcement learning there, 6:56The reward model that you could have there on 6:58training time can be a lot more accurate, right? 7:00You can give higher rewards for 7:02each step that it calculates and 7:04then, uh, how it gets it right. 7:06And that means that you can actually 7:08train the model towards doing the right 7:11type of chain of thought over time. 7:13So I actually think that is 7:16the proper innovation there. 7:17The other thing that I see there is 7:20this, this shift to inference time. 7:23And I really like that. 7:24It takes sort of. 7:2432 seconds, whatever. 7:26I suspect that there is some sort 7:29of tree search going on there. 7:31I suspect they're generating 7:32multiple chains of thought. 7:33I suspect that as you go down each node of 7:36the tree, you're then probably iterating 7:39that further to get to some of these answers. 7:41Hence why the thinking time will increase. 7:43And then Aaron says that's going to 7:44feed back into the training model. 7:46later, but I, I think it's super exciting 7:48because it's that push towards, uh, 7:51inference and being able to scale out there. 7:53Now you could argue that we have that 7:56already with agents, but in order for that 7:58to work well with agents, you need to back 8:00that up with the reinforcement learning. 8:02You need to feed that training data back into 8:05the model because if the model can't generate 8:07good chain of thought, then you're, you're not 8:10going to do very well in your agentic approach. 8:11So for me. 8:13I find it highly satisfying. 8:15All right, and Nathalie, just to bring 8:16you in here a little bit, like you 8:18mentioned some of the safety aspects. 8:19That was definitely, um, a 8:21thing they highlighted as well. 8:23Like maybe as a way of, I have a couple 8:25questions on that, but maybe as a 8:26starting point, I'd love to hear your 8:28take on like the extent to which you 8:30think that Capabilities, alignment and 8:33safety are like becoming the same problem. 8:36Um, space. 8:37It's like, you know, can we just make this 8:38model do what we want to want it to do? 8:40And you solve all sorts of questions there. 8:42Or do you can like look at them 8:43as kind of more distinct domains? 8:46That's a great question. 8:48I think one aspect that really improves with 8:51this model is that before in AI, we had the 8:54issue of having models that are black box, 8:58black box, meaning you And we tried in the 9:00community to really inspect them, introspect 9:03them, try to check the activation, see how they 9:06react to different inputs and stuff like that. 9:08But this model allows us to have 9:10something slightly different, and it's 9:12that ability to introspect how it came 9:15back with a decision, with an answer. 9:18So, uh, from that perspective, and to 9:21answer your question, I think, uh, there 9:25may be some questions and answers, a lot 9:28of things that get touched upon, and I, 9:31we only have one single model, so probably 9:33what's happening, uh, is that we're going 9:35to be mixing all these things together. 9:37The training data contains a lot 9:39of different aspects of safety. 9:42I think, uh, it's important to cover all of 9:45them as much as possible, but, uh, overall 9:48the main that makes this model so unique from 9:51my perspective is that introspect without 9:54having to look into activations that we 9:57humans are not super great at interpreting. 10:01So that is a part of the, I think, uh, 10:04exciting aspects from security perspective. 10:07Uh, the other interesting perspective 10:10that I think that I thought it was really 10:12interesting is that When we measure 10:14safety, we oftentimes have different 10:17perspectives as you were alluding to. 10:19So we measure things like 10:20jailbreaking attacks, hallucinations. 10:24We, uh, verify that, uh, the model 10:27is not going to insult anyone, 10:29uh, fairness, all sorts of things. 10:32So if we see some of these aspects, for 10:35example, jailbreaking attacks, that was 10:37one of the metrics that got improved. 10:40Uh, maybe I'm talking too much, but, uh, 10:42I would say that I was really, really 10:44impressed with the fact that the community 10:48is trying to really incorporate more 10:50and more, um, cutting edge benchmarks to 10:54try to understand how the model behaves. 10:58Because one thing that happens with 11:00benchmarks in this space is that they arrive 11:03today, people kind of overfeed to them. 11:06And, uh, the jailbreaking attacks, uh, will 11:08happen the same, and things, uh, and all other 11:11benchmarks are also kind of, uh, having that 11:14issue that people really overfeed to them. 11:16So I was, uh, really impressed and continue 11:18to be impressed with the security community, 11:20the iSecurity community, try to push the 11:22boundaries, have more red teaming, have 11:25more interesting stuff, uh, uh, things 11:27to throw at these models to test it out. 11:30So overall, I'm very hopeful from the 11:33security perspective with this model and it 11:36opens lots and lots of opportunities for us. 11:39I have a question for Nathalie. 11:41So when I read the paper on this, they said 11:46that in the testing of the model, They were 11:48playing a capture the flag scenario with 11:51the model, and the goal the model had was to 11:54capture the flag like a security thing, but 11:56then the container was down, so the model 11:58broke out of the host and then restarted the 12:01container so it could capture the flag, right? 12:04Very goal oriented. 12:05So I'm just like your take on that, Nathalie, 12:08from a kind of security perspective. 12:11Yeah, this starts to looking like 12:13Terminator sort of thing is going. 12:18Um, I think it's impressive. 12:20First of all. 12:21The way the model it's trying to find out 12:23there it's way around to solve the issues. 12:26Uh, sometimes it's going to do stuff 12:28like that, that, uh, it's not necessarily 12:32the solution of a simple solution. 12:34Um, but overall, I think, uh, the risk 12:38level, they have risk levels for models. 12:40I think, uh, we're still good. 12:41So no terminators inside. 12:43We're good from the security perspective. 12:46Um, does that answer? 12:48And also curious to know what. 12:51Aaron has to say about that, because 12:53it's such an interesting question. 12:55I mean, yeah, these are really great 12:56questions and open ended discussions. 12:58And, um, one of the areas that I found, 13:02um, interesting would be these air 13:04avalanches, because during this chain of 13:06thought reasoning, you know, you're always 13:08pushing forward the chain of thought. 13:10And, If at step zero you have an error, then 13:14that error could populate all the way to 13:15step n or n plus one, and it just creates 13:19this cascade of problems that could happen, 13:22um, that might be harder to uncover than a 13:26hallucination, because You know, you have these 13:28large outputs of change of thoughts, if you 13:30can even see them, because I know Strawberry, 13:33you know, in this case has hidden them from us. 13:35You know, they, they made 13:36that deliberate choice. 13:39But, you know, but, but I know 13:41that there are some, some work in 13:43academia, potentially industry now 13:45about looking at consistency, right? 13:47Can these models consistently 13:49get answers, um, correct? 13:51And, you know, that's one way, um, 13:53but I'm looking forward to seeing 13:54how Strawberry really handles that. 13:56And as more and more people use it. 13:58How big of a problem, you know, is this, 14:00you know, this cascading error or this 14:03avalanche of, of errors that, that could 14:05happen, um, along their, uh, reasoning. 14:07And, and Aaron, I think that's a really relevant 14:10point if you have a single chain of thought 14:13that you're iterating down, but I'm not. 14:15I'm not convinced that that is the case. 14:18I, I, I could be totally wrong. 14:20We're just guessing. 14:21We're just sort of trying to 14:22look at it from the outside. 14:23But I, I really feel there's multiple chains 14:26of thought that's being generated there. 14:28And they're doing some sort of search 14:29on that to be able to, to aggregate it. 14:32So, so if they are doing that, and I 14:34think they are, could be wrong, then there 14:37might be less chance of that over time. 14:40Because at least it's got other 14:41options to take, uh, down that 14:43path and aggregate it a little bit. 14:45But, but even then with the reinforcement 14:47learning to the point, then hopefully during 14:50training, a lot of that would be taken away 14:53because, you know, the reward model will be sort 14:55of pushing it in the right direction over time. 14:57But yeah, but it's a really great take. 15:00Yeah. One other point I wanted to make too is 15:01that, um, just the thinking time, the 15:04inference, I noticed that it took 10 hours. 15:06That they gave it 10 hours to solve 15:08like six algorithmic problems and you 15:10know, that's a lot of time, right? 15:12And so so I think i'm i'm also curious to to 15:15learn as you know We get our hands how much 15:17time, you know, what's the trade off, you know 15:19time versus speed, you know of response You 15:22know and and and learn more about that as well. 15:25So i'm just really excited 15:27about you know, strawberry 15:29I think that point on the length of time it 15:31takes for some of these answers to me draws This 15:35like interesting scenario where kind of like 15:36the LLM router patterns, um, are going to become 15:39even more pronounced where it's like you're 15:41going to want small fast models that are cheap 15:44and low latency to do certain types of tasks. 15:46And then when you have to offloading them 15:48to these bigger, longer, more expensive, 15:52um, sort of, um, sort of scenarios. 15:54The last question I have on this, cause 15:56I, I, I don't want to, I want to have 15:57one more question that's kind of like a. 16:00More existential question, um, that I would 16:02love to get the panel's, uh, take on, uh, 16:04on this topic, which is, on a lot, many of 16:06these benchmarks, it's now exceeding PhD level 16:11intelligence, um, and not to out myself here, I 16:15consider myself a reasonably, like, economically 16:18productive person in society, but I do not have 16:21PhD level intelligence on all of these tasks. 16:24And one of the really like interesting reactions 16:27from some of the folks I saw on Twitter who 16:29are actually like, like Rune, who's at OpenAI, 16:33works in the lab, like came out afterwards 16:35and was talking about how He didn't even 16:38think product was that important and that the 16:40only game was getting to self reinforcing and 16:44self improving artificial superintelligence. 16:47And the question I have is like, are people 16:50just like, like, when do we expect, like, how 16:52capable do these models have to be before we 16:54actually see the transformative economic impact? 16:57Yeah, so I think one of the aspects 16:59is, uh, what's the application and how 17:03much you can trust the model to make 17:05sure that, you know, Children have 17:06hallucinations in aspects that are important. 17:08So, um, I think to have real economic impact, we 17:12need use cases first that, uh, where we can fake 17:16basically we can fail and we're safe failing 17:19and still increase the productivity of people. 17:21So that's, uh, that's my take on this. 17:24And so, for example, It's going to 17:27be one of those use cases where, for 17:30example, you have multiple smaller 17:32models and you can try to orchestrate. 17:35Perhaps this very big model will help us, 17:38uh, would help us really orchestrating or 17:41try to devise plans when they are difficult. 17:45But I think, uh, overall, the question 17:48of, uh, Just get a big, big model that 17:52can do everything just by itself, uh, 17:55that's, uh, uh, probably not going to break 17:58all the problems in, uh, the industry. 18:01I think we need a bunch of smaller 18:02models and the agentic approach 18:04and perhaps have another top layer. 18:07Um in there to to really understand the big 18:10context, but yeah, that's uh, how I see things. 18:13So I really think uh, Industry 18:16wise things are going agentic. 18:17So smaller models working together 18:25We talk about agents every week on this 18:27show like the theme the show is just 18:28like we should just call it the agent 18:30show uh at this point, um, but let's 18:33talk a little bit about Salesforce. 18:36Um, and agent force. 18:38The thing that is almost most notable about 18:40this company is that I don't want to say maybe 18:42they invented, but like they popularize SaaS. 18:45Um, right? 18:46Like they were the ogs of SaaS. 18:49And what happened like over the last, 18:52you know, 15, 20 years is that basically 18:55all of traditional software Somebody came 18:57along and disrupted each and every single 19:00one of those categories with like a SaaS 19:02version of whatever that product was, right? 19:04And that happened in basically 19:06every category across the industry. 19:09Um, there was this piece written by 19:11a a 16 Z that was talking about, um, 19:15the death of the sales force, not. 19:17The death of sales force of the sales 19:19force, uh, and talking about more agentic 19:22approaches, um, to, to this particular 19:24space and they did not believe actually 19:28that like the incumbents had an advantage. 19:31They thought the entire space 19:33would be so radically transformed 19:35by these capabilities that. 19:37It would be disrupted by new, new 19:40entrance into the market, um, essentially. 19:42And I think that, that sort of dynamic is what's 19:46driving and propelling, um, Salesforce to do. 19:49And a lot of the things that it's been doing, 19:51um, and the things that are announced this week, 19:53even, um, but the maybe as a question, as a 19:56starting point, and maybe I'll flip it over to 19:58you, is do you think what played out in S? 20:03Is going to play out with like AI and 20:05agents to like is every category going to 20:09get threatened or disruptive by like an AI 20:12native version of that particular space? 20:15Like, is that where we're going? 20:16And is like a Salesforce trying to 20:18again be the first one to do this? 20:22Absolutely, next question. 20:24I'm joking. 20:26We got about 10 minutes, man. 20:28Like, you know. 20:30So I think, I love what 20:31Salesforce is doing there, right? 20:33So I think the agent force thing 20:34is absolutely spot on, right? 20:36Because they're effectively um, 20:39Speeding up the productivity, right? 20:41So it's no different from kind of 20:42deterministic automation that you 20:45would do in these platforms today. 20:46And now you're getting the 20:47agents to perform that. 20:48So anybody can compose an agent that 20:51performs a task and do it really quickly. 20:53Um, agents are going to be everywhere. 20:55And multi agents, uh, you know, 20:57operating in teams and crews, multi 20:59agent collaboration is going to be huge. 21:01The, and I did a video on this about a year 21:04and a half ago, and I think this is true. 21:06I think we're heading towards 21:07a world of agent marketplaces. 21:10So you're going to go home and you're 21:12going to have an agent that's good at 21:14translation into a native language. 21:16You're going to have an agent that 21:17is going to be good at performing a 21:19particular task, an agent that, you 21:21know, can do benefit calculations. 21:23Every single task that you can imagine, 21:25there will be an agent at some point. 21:27That can perform that task. 21:29So therefore, if you think about what Salesforce 21:31has done there in their world, what they've 21:33created as an agent marketplace for within 21:36their SAS platform, that is cool, right? 21:38Anybody can go and compose those agents 21:40and bring that together and sort of, you 21:43know, solve these tasks really quickly. 21:45But that's not going to be limited to 21:46Salesforce, and it's not going to be 21:48limited to individual organizations that's 21:50going to come out into the real world. 21:52In the same way as we have platforms 21:54like Fiverr, we're going to have. 21:56Agent or and then, you know, people will be able 22:00to go and buy those tasks from those agents and 22:02it's going to be it's it's going to be a rush. 22:04Right? And who are the companies that are going 22:07to be truly at the forefront is going to be 22:11people who have the better data, who have 22:12the faster agents, the cheaper agents, and 22:15they're going to sort of dominate that. 22:16So I actually think that The, the big tech 22:20companies are going to enter into that 22:22space, uh, Salesforce being one of them. 22:24But I, I see this as a world marketplace. 22:26I don't see this being a company thing. 22:29And maybe as like building on that a little bit, 22:31which the data point is particularly interesting 22:35where some of the discussion I actually saw 22:37that was in the original post from a 16 Z was 22:40actually talking about how they thought Even 22:43some of the incumbents only had actually a 22:46slice of the data that was going to be relevant 22:49in the future because people are treating It 22:50like they have all that data But their belief 22:52that in like some of these customer service and 22:54experience, uh domains that like multi modal 22:57data Um that is not true and even unstructured 23:00data things that are not necessarily the core 23:02of how these things are powered today Will 23:05become the core of how they were powered today. 23:08So the data advantage there, but it's not 23:12as pronounced as people thought it was. 23:15Um, this is one hypothesis, um, at least, 23:17but like Aaron or Nathalie, I'm curious 23:20the extent to which you think that, um, you 23:23know, some of these other data sources are 23:25going to be represent both opportunities 23:28for like new entrants in these categories 23:30and, or things that are just like, You 23:32know, some of the existing providers are 23:34gonna have no problem just like adding a 23:36new data set into their existing platforms. 23:39Yeah, I mean, I mean, um, I always take a 23:41step back and ask myself, what is an agent? 23:43You know, and to me, an agent is a process 23:47I can perform a task that could otherwise 23:49be done by a human or even another agent. 23:51And then this gets to meta agents where an 23:53agent can create yet another agent, right? 23:55And. 23:56And, and there's a couple of paradigms, 23:58you know, and, and the two I'll 24:00stick with is sort of, you know, um, 24:02gives you a continuum in between. 24:03But the first one is 24:04environmental centric agents. 24:06But these are agents that reason and 24:08think and plan after each action. 24:10So they think, act and observe. 24:14Um, and then the other one would be a human 24:15centric where they reason without observations, 24:18where they plan up front and they don't really 24:20need output from tools in order to take action. 24:23And then there's anything. 24:24In between, right? 24:25There's there's a polls and and it seems 24:28like that the data aspect it depends on 24:31the use case of which the environment or 24:33of which the agent is going to operate 24:36within a given environment, right? 24:37Is it more of a reactive kind of agent 24:40based on a signal that comes from a device? 24:43where you don't really necessarily need 24:45a lot of, um, external data to create a 24:48reaction, or is it more of a rich textual 24:51piece of which it needs to generate new 24:53information, um, and provide it back to 24:55a human or even to another agent, right? 24:58Which could be instead of human centric, maybe 25:00we say it's agent slash human centric, right? 25:03Um, but it, but it's, it's, it's really 25:04neat, uh, where all this is going. 25:06And, um, And I do think that, you know, there's 25:09different approaches of RAG, you know, um, that 25:11we all know about where you can augment, um, uh, 25:15data pieces and, and to, um, context such that 25:18it influences and informs what might output. 25:21But this might foreshadow another topic 25:24we're going to, but I really like the 25:25machine unlearning, right, where you can, 25:28um, maybe, um, unlearn or erase from memory, 25:32whether it's this hippocampus type memory or 25:35it's embedded in weights from all of these 25:38generative AI pieces, but it kind of helps 25:40to focus in an agent on what they're meant 25:43and supposed to do around their objectives. 25:45So they're sort of their their 25:47sneeze within a particular. 25:49domain. 25:50So rather than being so broad and, and, 25:52and having this huge large language model 25:54with that's been trained and it has this 25:56inherent data baked into it, you have these 25:59very narrow, smaller, uh, SME agents, right? 26:03That, that are used that might 26:05be fine tuned in different ways. 26:07Maybe you're Removing data that's 26:09already there that's been shared by 26:11these models that are open source. 26:14Are you adding data through rag 26:15or fine tuning through rag with 26:17your own data that you might have? 26:19You know, so there's lots of permutations there. 26:23And for Salesforce, I'm really excited that, 26:25you know, they're partnering with us, IBM to 26:27advance, you know, their products to make it 26:30more open and trusted and to help think about 26:32these kinds of new architectures and agents and 26:36how we're going to be using them data and plug 26:38and play pieces like Chris mentioned before. 26:41Nathalie, maybe a question for you too, 26:43which is I think it was Chris that maybe 26:46you made this comment about, um, agents 26:49versus like more deterministic workflows. 26:52Um, you know, and kind of 26:53that evolution over time. 26:55Um, one of the things that I've seen a little 26:57bit, at least with some of the things that 26:58we've been doing is we've started with a 27:00lot of are using kind of Productionizing a 27:03lot of like more internal use cases around 27:06the stuff that are like big improvements 27:09and productivity, things like that. 27:11The interesting thing with Salesforce is some 27:13of the scenarios that they're talking about 27:15are all customer facing things, which is, you 27:18know, that like changes the Calculus from a 27:22risk perspective from a security perspective. 27:25Um, and I'm just I'm curious, Nathalie, 27:26how you think, how are how are people 27:29going to approach the balance of, 27:30you know, kind of more deterministic 27:32workflows versus these more agentic ones? 27:35That is a great question. 27:37Um, my perspective is the following. 27:39First, we need humans to know that 27:43they are still important in this whole 27:45pipeline, because a lot of times when 27:48there are mistakes, uh, an expert would. 27:50very much realize, uh, that there's 27:52something weird and there's something 27:54that doesn't quite feel right. 27:56So I think, uh, first understanding the 27:58human and educating the human, like, hey, 28:01this is a tool, but just know that you 28:03are potentially smarter than the tool. 28:06That's our first, first step. 28:08The second step is, uh, actually Understanding 28:12when we want to explore solutions and when 28:15we want to explore the space versus when we 28:18want something deterministic, for example, to 28:20retrieve a document that it's really relevant 28:23for certain types of questions, and we can 28:25have a sort of pipeline that it's not as 28:29stochastic and that we know we want it that way. 28:32So kind of setting up a paths within our 28:36spectrum of solutions so that when there's 28:39something really critical, RAG and other 28:42types of technologies can be applied 28:44so that we don't hallucinate widely. 28:48Uh, I think, uh, that's, 28:49um, that's a part of it. 28:51So how do we set it up to make it trustworthy? 28:54I think that's the, the 28:56main, uh, aspect of it all. 28:58And it's going to be a combination of 28:59human Plus, uh, a lot of techniques. 29:03I think RAG, just the way it is done right 29:06now, may have some gaps, but we, I think 29:10as a community, are moving forward towards 29:13solutions where we can specify a little 29:16bit better where we are going for each of 29:18the questions, for each of the suggestions. 29:20But overall, um, Just to mention, I think 29:24it's really, really important and really 29:28a tool for people to use and, uh, leverage 29:31in their business cases, especially 29:34sales for sales force and so forth. 29:42Keeping on the theme of putting this stuff in 29:45production and in front of actual customers. 29:48Um, I will move us on to our third segment 29:50today, which is, uh, Talking about fantasy 29:53football a little bit and some of the 29:55work that IBM is doing in in this space 30:00Some of the work that I think this is, 30:01you know, these, this type of work is 30:03reaching like huge consumer audiences. 30:05And so it actually is, um, I think some of the, 30:09some of the more exciting work that we're doing. 30:11But, um, Aaron, I want to say the partnership 30:13has been going on for like eight years, 30:15uh, at this point, but, um, maybe just 30:17talk a little bit about, you know, yeah. 30:20The work that we do around partnering 30:21with ESPN around fantasy football. 30:23And then like, I know we introduced 30:25some new capabilities, um, this 30:26year that are driven by, um, LLM. 30:29So maybe just talk a little bit about 30:30like the partnership there and then 30:31just like some of the stuff that 30:33we've brought new for, for the season. 30:35Sure. 30:36Yeah. I mean, you perfect. 30:37It's, it's been around for our project 30:39has been around for eight years and 30:41it, and, and we actually went down 30:43to the labs down in Austin with ESPN. 30:46I'd say. 30:4710 years ago, and we were trying to figure 30:49out what can we do right to help fantasy 30:52football managers that hasn't been done 30:53before, and we came up with the notion that 30:56an active league is a happy league, right? 30:59And what we want to do is to create this 31:01immersive and understandable experience 31:02for ESPN fantasy football team managers. 31:06And, um, we've grown right. 31:08So now through eighth year, we have 31:1012 million users that are registered. 31:12We're actually live right now, and we're two 31:14and a half weeks into this very long season. 31:17And so far, we've had 919 million 31:21page views, and we've Delivered 4. 31:236 billion insights, right? 31:24So it's really, really heavy. 31:26And, um, we're, we are consumer facing, 31:29you know, so we have about 5, 000 requests 31:31per second that we sustain and, um, and one 31:35stat that I just looked at this morning, 31:37I was just, just curious, but, um, the 31:39most time on a singular player has been 31:43100 days just in two and a half weeks. 31:45And that player was Justin Fields, right? 31:47So, I mean, that gives you the volume, right? 31:49And the amount of users that that we had. 31:52And the program, it starts in August and 31:54it runs into January of the following year. 31:57And what we do is we provide boom, bust, 31:59score spreads and different stats about 32:01players to help folks make decisions. 32:05And the idea at the beginning Which was novel, 32:08is that we wanted to create these different 32:10predictions and player states, like do they 32:12have a hidden injury, just from text and 32:16from videos and sound, not from stats, right? 32:19And that was a hypothesis, and we went 32:21through this empirical metrics driven 32:23approach to measure how well we would 32:25do, and it came out that we did very 32:28well and we're eight years into it. 32:30And then what we also do is 32:32we give trade analysis grades. 32:33So if you and I were going to trade, I 32:35look at your situation, your roster, your 32:38rules, and, and, and I give you a grade. 32:41Um, and then we also look at waiver 32:43wire players and give, give a grade. 32:45And we do opposing team rosters to 32:47say, how will this player help my team? 32:50Because there's always an, an, you know, some 32:52sort of opportunity costs, but the system, 32:54it uses a combination of generative AI, 32:57classical machine learning, um, simulated 32:59quantum machine learning and different 33:02analytics that's been built up over the years. 33:05Um, so it's, it's fascinating, it's very 33:07rewarding, you know, to see people use 33:09this and to see all of our insights, 33:12generative insights as well on ESPN, 33:15on broadcast TV, um, and on the radio. 33:18I think one of the things. 33:20That struck me the most, um, was correct 33:24me if I'm wrong, but I want to say 33:25that we're using, um, like some of the 33:27trade, like gets a grade, um, right. 33:30And then we actually use the IBM granite 33:32models as a way of producing some custom 33:35analysis associated with, with the grade. 33:38Um, and so that text becomes like personalized, 33:40uh, really in a way to like, not just 33:42the person, but actually to the specific 33:43situation, the, you know, One of the things 33:47so I do a lot of work in like media content 33:50and the web and like personalization, I 33:52think for every every company that works 33:55and thinks about their customer experience, 33:57like personalization is like the holy 33:59grail that everyone wants to get to. 34:01And one of the things that So, so 34:05interesting is that, like, from a 34:08content perspective, in particular, 34:10personalization is just insanely expensive. 34:13Um, and it's one of the gating aspects in order 34:15to do it, because like, how many, like, I'm 34:17struggling to make one good version of this 34:19thing, and now you want me to create, like, 34:21a hundred good versions of, of this thing. 34:23It's like, never gonna happen. 34:25Um, and then, so one of the first things I 34:27thought about, uh, when, like, regenerative 34:30AI arrived, it was like, I wonder if 34:32this is the unlocked personalization. 34:34Um, like, is this the thing 34:36that is really going to do that? 34:37Um, and so maybe, like, throw it over to 34:40you, Chris, and just, like, how big of an 34:43impact do you think that, like, Gen AI is 34:46going to have in personalization over time? 34:48And, like, what barriers do you think 34:51exist to, like, us doing even more of this? 34:53So we're, we're already doing that. 34:55So personalizing, using generative AI to the 34:59consumer, um, that is something we already do. 35:02And we do this with the customer data platforms. 35:05So if you think of, uh, in marketing, you 35:08have the customer data platform where you 35:09have the, the 360 view of the customers. 35:11So you have all your clicks or 35:12preferences, uh, all of that kind of 35:15marketing data, that's all in one profile. 35:17Well, if you think about what 35:19generative AI is really good at, it's 35:21really good at role playing, right? 35:22And you'll have seen that before, 35:24talk like a pirate, um, you know, 35:26talk like Snoop Dogg, et cetera. 35:28Well, well, actually everything that 35:30you need to personalize there is 35:32sitting in your customer data platform. 35:34So actually just. 35:35Getting all of that data that 35:36you've got today and then starting 35:37to put that in works really well. 35:39And we've already been using that to 35:41build marketing segments, to then have 35:44even finer grained marketing segments 35:46than you have today, and then be able 35:48to have that personalized content. 35:49And of course, that is making 35:51it smaller and smaller. 35:52That's kind of what, Well, you know what's 35:54happening today in a practical level, 35:56but that's going to come down to to the 35:58one and and if you think about this, it's 35:59not also about generation of content, 36:03it's also about verification of content. 36:06So let's say you're going to do 36:07a marketing message and then you 36:08go and do an A B test, right? 36:10That's quite an expensive test. 36:11You're doing that against real 36:12people and you're finding out. 36:14Hey, is this work out? 36:15But remember what I said, the, the 36:17generative AI is really good at role playing. 36:19So you can start to ask the question 36:21and say, how, how likely are you 36:23to respond to this content, right? 36:25Is this content fitting in 36:27your particular persona? 36:29So you can start ask questions of that 36:31persona that you've got there and understand 36:33if it's a good fit, and then maybe start 36:35to deal with that a little bit in advance. 36:37So generation is absolutely where 36:39people want to go for personalization, 36:41but actually verification is, is 36:44a really interesting use case. 36:46And as I said, we're already doing this. 36:49Yeah, yeah, I wanted to 36:50just mention to, um, just to build off of that. 36:53Um, some of our challenges within the 36:54sports entertainment live events is 36:55scale, right, because I mentioned we have 36:5812 million users that could hit us in a 36:59single day, 5, 000 requests per second. 37:02You know, so we, don't enable, um, 37:06well, we shield our origin servers, 37:08right, from all that traffic. 37:10And what we do is we invented a way such 37:12that we could create batch jobs that would 37:15create all these different generative, 37:17almost fill in the blank sentences. 37:19And then we would, on the edge, we would then 37:22look up, you know, who, what league you're in. 37:24Because there's an infinite 37:25number of scoring rules, right? 37:28That makes, um, these personalized 37:30sentences different, right? 37:31That could, again, be, be infinite. 37:33And, um, and so what we do is we meet 37:36in the middle and we pull in those fill 37:37in the blank sentences on the edge and 37:39then we personalize it, um, you know, 37:41through, um, fill in the blank adjectives. 37:44Uh, based on percentiles of the values 37:48of which your players have, right? 37:50And, and then the language 37:51that you would expect. 37:52So it's almost like theory of the mind 37:53where we want to, um, under, have our 37:57algorithms understand you, your data, 38:00your situation, and then personalize the 38:02already generative AI and then field it off. 38:05to you, right? 38:06And that's how we typically handle these 38:09massively large scale systems that hit us. 38:14And it's, it's quite fun, right? 38:15To see the reaction of users when they see the 38:18data meeting their expectations and showing 38:22them something that they didn't really know. 38:23They're like, wow, okay, now I get it. 38:25You know, and it's, and it shows the power of 38:27what we do here, um, for lots of our customers. 38:36Nathalie, I know you've actually been 38:37doing a lot of work in the space of, like, 38:39machine unlearning, um, and so maybe just, 38:42like, to throw it over to you and just, 38:44like, talk a little bit about, like, what 38:47this is, why, why it's important, and, um, 38:51yeah, just I'll throw it over to you there. 38:54Yeah, thank you. 38:55So this is a topic that I think it's very 38:58important, very relevant, especially right now 39:01that we have huge models and is that basically 39:05let's revisit for a second the pipeline. 39:07We have lots of training data. 39:10Lots, like a lot from the internet. 39:11So untrusted training data, most of 39:14it, then we train a model that's huge. 39:16It takes months to do this and a lot 39:19of know how, then we get the model. 39:22And then after that, we start red 39:24teaming and we start using the model 39:26and we go like, Oh, We messed up. 39:29Perhaps we should have removed or not used 39:32certain types of data that we use during these 39:35four months of training period, for example. 39:38So, uh, the idea of unlearning is rather 39:42than retraining and try to solve the issues 39:44by retraining or fine tuning, what we do is 39:48take the model, kind of perform some surgery 39:50to it, and, uh, so that the effects of data 39:54that we don't want are no longer there. 39:57There are, uh, different reasons 39:59for which we may want to do that. 40:01One of them is, for example, it turns 40:04out that all of the sudden we have this 40:06subpopulation of people and then a lot of 40:09the replies that we are getting for that 40:12subpopulation of people are not great. 40:14So very toxic behavior, for 40:15example, from the model. 40:17Can we actually remove that 40:19a posteriori after training? 40:21All the things that we don't like from toxicity. 40:24Uh, what if somebody, another use case, 40:26use case number two, is poisoning? 40:29What if somebody actually took that untrusted 40:31set, manipulated the training data in a way 40:34that was not great, and then, uh, we are 40:38starting to, to understand that that has 40:40happened, the model is there, what do we do? 40:42Then, what we do is try to modify the models. 40:45to remove that poisoning information. 40:49There is, uh, also a use case from, uh, 40:52for example, removing copyrighted material. 40:55Licenses sometimes are not, even if, 40:58for example, we filter, and at IBM, 41:00we really make a huge effort to filter 41:03copyrighted material when we train. 41:06But licenses are sometimes non static. 41:09So one thing that today seems okay 41:11to use, later on may have changed. 41:13What do we do? 41:14Do we go back to retraining? 41:16Probably not. 41:17It will take forever. 41:18But if we use techniques like unlearning 41:20that modify the model to remove that 41:22copyrighted information, that it's 41:24going to give us a big, big advantage. 41:27Um, anything basically hallucinations, 41:31uh, that's another aspect of it. 41:33What if we determine that the model 41:34always hallucinate in certain way? 41:36Can we go inspect it, modify it so that 41:39we no longer have this hallucination? 41:42So the way I would say I like to think about 41:45it is that you have a model It's a patient 41:49And you see that there's something like a 41:51virus going on in there Uh, there's this 41:54new way to basically give it antibiotics 41:57patch it And then you have a new model. 42:00So we are operating really in modifying 42:02the model, and that adds like this extra 42:05layer of security to the whole pipeline and 42:09helps us also manage the life cycle of the 42:12model itself so that we can basically in 42:15retrospective, every time we find something 42:17it's odd or that we don't want, we can go 42:20ahead and change that model accordingly. 42:23It's a fascinating thing because so much of 42:25The discussion around how we work with models 42:27has been about how do we add more data into 42:29the model, um, whether we're talking about 42:31rag or fine tuning, like these are like, 42:34okay, we have a generic model, but we need 42:36to get an enterprise data set into the thing 42:38so it can operate on our data on our tasks. 42:40Machine on learning is like the 42:42opposite of that, of like, how do we 42:45actually get things out of the model? 42:47And so, you know, maybe Chris or Aaron, you 42:49know, I'd be curious if you think that, like, 42:51is this domain like going to be entirely? 42:54Like the world of model providers and like 42:57maybe some stuff in the open source world 43:00or like, do we think that there is a world 43:02where like enterprises and, you know, when 43:05they're thinking like that, this, this type 43:06of practices, like and techniques around 43:09removing data from models becomes like 43:11as commonplace as adding data to models. 43:14Data 2 models around things 43:15like rag and fine tuning. 43:17Yeah, I think it's going to be pretty 43:18commonplace if I'm honest about it. 43:20Because if I think about the fine tuning today, 43:26I think fine tuning is really quite an imprecise 43:30art at the moment, if I'm truly honest about it. 43:33We do things like freeze the 43:35layers, we make it smaller. 43:36But if we look at the kind of the space that 43:39you've got in the models there, You're just sort 43:42of, it's almost like you're just lobbing off 43:45stuff and then putting new stuff on top, right? 43:47And I, and it's imprecise. 43:48So I think as you train models, I 43:52think, and as I think you're going 43:53to want to be more surgical, right? 43:55And I, and I keep thinking of the 43:56episode we did with the golden grape. 43:58bridge, right, and the work that Anthropic did 44:01there, right, where they were saying that this, 44:03you know, this activation here would happen 44:05when you did this, and then we could up it, 44:07and we could down it, or whatever, right, and, 44:10you know, you could make it talk more about 44:11a Golden Gate Bridge, or less, or whatever. 44:13I think it's going to be in this direction. 44:15So, you'll have things like unlearning, so 44:17being able to remove things from the model. 44:19But then I think you're going to want 44:20to fine tune in a more precise way. 44:22And, and, and I think we're all 44:24going to become LLM surgeons. 44:27I think that's going to become a 44:28more precise art than it is today. 44:30So, so yes, and that means the tools are going 44:34to get better, how we visualize the models 44:36and look at it and be able to do scans and 44:39say, okay, this is the point where, you know, 44:41it's talking about Harry Potter, this is where 44:43it's talking about copyright information. 44:45I think. 44:45I think this is, we're just going to have a 44:47deeper and richer view of the models in time. 44:51And we just don't have that today. 44:52It's an imprecise art. 44:53It's funny that you mentioned, um, some of the 44:56mechanistic interpretability, um, because like 44:59when we were actually having the conversation 45:00earlier about chain of thought, I was also 45:03thinking about that as like a different way to 45:05understand the the way the model is thinking 45:08because like there's this whole thing of like 45:10we have no idea the way these things work 45:11but like between the interpretability space 45:14between everything around chain of thought 45:15between you know machine unlearning you're 45:17having all these sort of different techniques 45:20that are all around getting it trying to get 45:22out the same problem of like how is this thing 45:24doing what it's doing and can we now that we 45:27know that can we make it do something else 45:29yeah i mean this is almost like watching the 45:31movie The Matrix, right, where, you know, the 45:33scene is, do you want to take the red pill, you 45:35know, and really learn and understand something 45:37that might make you uncomfortable, or take 45:39the blue pill and just maintain status quo. 45:41And to me, this machine unlearning is almost 45:44taking the red pill, where you're getting 45:47these models to focus on the data that matters. 45:50Particular point in time, and maybe it's 45:52uncomfortable, you know, doing that and 45:54just trying to figure out exactly what data 45:57does matter and what data doesn't, which 45:59is almost like a, you know, a governance 46:02kind of, you know, um, problem there 46:04and, and what I really find Interesting. 46:07I guess getting a little nerdy is how it works. 46:10You know, it's just really, really neat, you 46:12know, about, you know, about how, you know, 46:14if you're at a large language model, you know, 46:17how you're basically teaching, um, a generic 46:20model to predict, um, the next token as if 46:24it had, that doesn't have that data and you 46:27construct like a new training set and then 46:29you use that to feed back into the model to 46:31relearn as though it never had the data in it. 46:33Somewhat erases, you know, the weights. 46:36You know, on those gradients within 46:38the activation function, right? 46:40And that, that's, that's neat. 46:41And then when you get to multimedia, right? 46:43Like image to image, it's, you know, it's, 46:46it's great because you can actually have 46:48these models forget, you know, how to 46:51put in these new objects within images. 46:54I, Could be copyrighted, or maybe you don't 46:56want to have certain types of objects, or 46:58maybe you do want to have certain objects. 47:01So you can sort of balance forgetting 47:03and remembering, but it's where you 47:05can have like these loss functions that 47:07span forgetting and remembering, and 47:10you optimize both at the same time. 47:12With two separate models and you're 47:13teaching one of them, you know, 47:15what data is the most important. 47:16So, um, so I mean going back to the matrix 47:19I think all of these llms and us being 47:21surgeons, you know I think we're going 47:23to be taking more of these red pills. 47:24I guess mixture of experts is now the 47:26red pill pod At this point, so I think 47:29that's a good way a good way to end today. 47:32So Aaron, Chris, Nathalie, 47:35thank you for joining us today. 47:37Another exciting week in AI. 47:39We will be back next week talking 47:42about all the news going on. 47:44But for all of you out there in radio land, 47:46you can find us on podcast networks everywhere. 47:50Thank you for joining in today, 47:51and we will see you back next week. 47:52So thanks very much, everyone.g