Learning Library

← Back to Library

AI's Promise Against Infectious Diseases

Key Points

  • The panel debated whether AI can truly eradicate infectious diseases, noting that while AI has accelerated drug discovery, viruses evolve faster than current algorithms, making a complete solution unlikely.
  • Dario Amodei’s “Machines of Loving Grace” essay sparked optimism by forecasting AI‑driven scientific breakthroughs, massive GDP growth in developing nations, and even world peace, but many experts cautioned that such visions overlook practical and ethical constraints.
  • Speakers highlighted that the impact of AI will depend heavily on how humanity chooses to deploy the technology, with competing interests and regulatory frameworks potentially limiting its benefits.
  • Scaling AI capabilities remains a technical hurdle, and achieving the promised health outcomes will require parallel social and policy changes to address misinformation, equity, and responsible use.

Full Transcript

# AI's Promise Against Infectious Diseases **Source:** [https://www.youtube.com/watch?v=BWKFzWUOBOg](https://www.youtube.com/watch?v=BWKFzWUOBOg) **Duration:** 00:41:06 ## Summary - The panel debated whether AI can truly eradicate infectious diseases, noting that while AI has accelerated drug discovery, viruses evolve faster than current algorithms, making a complete solution unlikely. - Dario Amodei’s “Machines of Loving Grace” essay sparked optimism by forecasting AI‑driven scientific breakthroughs, massive GDP growth in developing nations, and even world peace, but many experts cautioned that such visions overlook practical and ethical constraints. - Speakers highlighted that the impact of AI will depend heavily on how humanity chooses to deploy the technology, with competing interests and regulatory frameworks potentially limiting its benefits. - Scaling AI capabilities remains a technical hurdle, and achieving the promised health outcomes will require parallel social and policy changes to address misinformation, equity, and responsible use. ## Sections - [00:00:00](https://www.youtube.com/watch?v=BWKFzWUOBOg&t=0s) **AI's Fight Against Infectious Diseases** - A panel of AI experts debates whether AI has eradicated natural infectious diseases by 2034, highlighting both optimism and the challenges of viral evolution, technological limits, and competing human interests. ## Full Transcript
0:00we're jumping ahead it's October 17th 0:022034 has AI helped us solve nearly all 0:05natural infectious diseases my mad is a 0:08product manager for AI incubation Maya 0:10welcome to the show um what do you think 0:13thank you for having me so of course The 0:15Optimist in me would love to say yes but 0:18um I don't know if history has always 0:20proven us right and I think it really 0:22depends on how we choose to use this 0:24technology Kar El mcrai is a principal 0:26research scientist AI engineering at the 0:29AI Hardware Center cter welcome back to 0:31the show um tell us what you think thank 0:33you Tim it's great to be back well AI is 0:36making strides in tackling infectuous 0:38diseases but it's not a Magic Bullet 0:40viruses evolve faster than algorithms 0:43and the battle between pathogens and 0:45progress is far from over so there is a 0:47lot more work to be done all right so 0:49some Skeptics on the call and finally 0:51last but not least in joining us for the 0:53first time on the show is Ruben Bonin CN 0:55capability lead for adversary Services 0:57Ruben welcome and let us know what you 0:59think 1:00thanks uh glad to be here uh I think we 1:03can get there uh provided you know 1:05scaling continues uh but I think it's 1:07mostly going to be an issue of competing 1:10human interests if we do all right great 1:13well all that and more on today's 1:14mixture of 1:15[Music] 1:19experts I'm Tim hang and it's Friday 1:22which means that it's time again to take 1:23a whirlwind tour of the biggest stories 1:25moving artificial intelligence we'll 1:27talk about a hot new sampler that's 1:29getting a lot of attention and apple 1:31reing on ai's Parade but first I want to 1:34talk about Machines of Loving Grace an 1:36essay by Dario amade who's the CEO of 1:39anthropic and he makes some very wild 1:41predictions he says that AI might solve 1:43all infectious diseases it could 10x the 1:45rate of scientific discovery he promises 1:48that one you know wild but not 1:50implausible outcome is 20% GDP growth in 1:53the developing world and potentially 1:55even World Peace uh and so I think I 1:58just want to kind of uh bring this topic 2:00up because the essay has been getting a 2:01lot of play and a lot of people have 2:02been talking a little bit about it and I 2:05guess Maya I'll start with you I mean 2:06how believable do you think are these 2:08visions and you know what is more or 2:10less believable in in what Dario is 2:12predicting here so Dario definitely 2:14paints a picture that we would all love 2:17to believe in but of course people are 2:19going to be skeptical because a 2:21technology which is a tool can be used 2:23in different ways and currently the way 2:25that we're seeing AI being used um is 2:28not materializing necessarily in all 2:30this optimism that he said it's a mixed 2:32bag of how it's being used um so 2:36definitely there were advances in uh 2:38drug Discovery um but at the same time 2:40we're seeing articles about the rise of 2:43misinformation so I think the article 2:45overemphasizes the positive and I don't 2:48think it also sets in motion what are 2:50the prerequisites to get to this 2:52positive picture um I think it's going 2:54to have to come hand in hand with a lot 2:56of social change and not just a 2:57technological change yeah so you think 2:59the end result of AI is likely to be 3:01neutral if anything else is that right I 3:04don't think technology is neutral I 3:06think I think how you put it in motion 3:10there's definitely an agenda and a 3:11social context and an e economic context 3:13behind it and then that just unleashing 3:16it in different directions yeah for sure 3:19Kar I want to bring you into this 3:20discussion because I know when you 3:21responded to that first question you 3:23seemed a little bit more skeptical I 3:24don't know do you sort of agree with 3:25Maya that this is sort of achievable 3:27ultimately or you just kind of thinking 3:29this is I don't know marketing or over 3:32optimism about the technology I think 3:34certainly there are lots of things that 3:36we can achieve with AI of course there 3:38is also a hype so in in Dario's essay uh 3:42he explored the potential and also some 3:44limitations of AI and how it might shape 3:48society as it advances so one thing I 3:51particularly found interesting is how he 3:53emphasized the need to rethink AI as a 3:55powerful uh as powerful Ai and and also 3:59tap into the potential but also there 4:02are lots of challenges that I see and 4:04that requires you know continuous work 4:06continuous progress continuous algoritms 4:08so for example if you look in biology 4:10and health which you know he wrote a lot 4:12about that so I mean we've seen what AI 4:15can do a lot of strides and how it can 4:18significantly enhance research in 4:20biology and medicine but progress is 4:23often constrained by the speed of 4:25experiments availability of quality data 4:27regulatory Frameworks like for example 4:30clinical trials and uh and you know 4:33despite you know all of these 4:34revolutionary TOS like for example Alpha 4:37fault uh you know there there is needs 4:40you know also to have things like 4:41virtual scientists driving not just data 4:45analysis but also the entire scientific 4:47process and I think there's a lot of 4:49work to be done there um if you look for 4:52example if we want to look at it from 4:54pragmatic versus long-term impacts in 4:57the short term AI might be limited by EX 4:59infrastructure and societal barriers 5:02however over time I hope that these 5:04things you know can be resolved and that 5:07the intelligence can create new Pathways 5:09for example reforming the way 5:12experiments are conducted and reducing 5:14bureaucratic inefficiencies through 5:16better system design so it has to be 5:18kind of a collaboration between the 5:20intelligence and also society and humans 5:23uh and things need to be regulated 5:25because also like Maya mentioned there's 5:27all these fake news and art and data so 5:31there is also that danger that we have 5:33to be careful about you all those these 5:35threats and I think we're going to talk 5:37about that later also so how do we how 5:39do we balance all of this so we can push 5:42it towards a direction that is 5:44productive that is helping us and not 5:46that not you know in a direction that 5:48can impede our progress or create issues 5:51yeah for sure Kar I think one question I 5:53want to ask you before I move on to 5:55Ruben because I think he's he'll have 5:56some real interesting angles on this 5:58because he works on the many ways these 6:00systems can break or be used in you know 6:02not so great you know um for not so 6:04great purposes uh you work a lot on 6:06Hardware um and I think part of Dario's 6:09dream is the idea that eventually these 6:11systems will be able to control sort of 6:13physical robotics out here in the real 6:15world um and that will be just this huge 6:17kind of like boost to the effect this 6:20this technology has do you buy that are 6:22we close to that kind of world where 6:23it's really easy to kind of instrument 6:25these models to kind of control real 6:27world systems are we still pretty far 6:28away from that you think I think we're 6:30we're making progress towards that 6:32there's a lot of work right now on 6:34making the hardware the infrastructure 6:36more efficient and sustainability is a 6:38big part of that because right now you 6:41know we're hitting the physical limits 6:42the limits of physics so and there's a 6:46lot of work you know needed to create 6:48you know these chips that are capable 6:51of acting in resource constraint devices 6:54especially with you know this huge 6:57compute needs that AI keeps driving 6:59forward words with things like large 7:00language models and so the computational 7:02needs are just growing and now that if 7:05you want to also do things like 7:06reasoning and so it's going to be kind 7:09of an arms race you know more is 7:11required algorithmically but at the 7:14compute side there's a lot of 7:16innovations that needs to be done at the 7:17semiconductor level at you know the 7:20physical the material science all of 7:22that to create you know these chips that 7:24are capable of handling this huge demand 7:27while still doing it in a sustainable 7:29way and the cost effective way K sport 7:33like this subject was not addressed in 7:35this article at all like I think it was 7:37overly optimistic that yeah AI will 7:39solve climate change but in developing 7:41AI we're actually like missing a lot of 7:44sustainability targets that companies 7:46has set and that was like not at all 7:47addressed so if I want to use it to 7:50solve climate change I don't want to 7:52have data centers that are also emitting 7:54tons of carbon and consuming tons of 7:56energy to solve that problem 8:00um Ruben maybe I'll bring you in because 8:01I think as a security guy I mean my 8:04friends who are security people look at 8:05this kind of essay and they're like this 8:07is ridiculous right like this technology 8:09is largely going to be used for like you 8:11know bad purposes or you know these 8:13systems will be so vulnerable that 8:15they'll never actually achieve kind of 8:16the full potential um how do you size up 8:19these claims I guess as a security 8:20expert like you know do you do you sort 8:22of buy into the optimistic Vision here 8:24are you more skeptical I I I am an 8:26optimist uh personally yeah but uh like 8:29I mentioned in our introduction as well 8:31I 8:32think you know the technological 8:35achievements are one thing but then how 8:37do people with competing interest manage 8:41the outcomes of those achievements I 8:43think is something else uh for example 8:46like in the article they talk about or 8:48he talks about um sort of author 8:51authoritarian regimes and um how you 8:55know AI systems clearly have 8:57applications to you know restrict what 9:00people can do how they can think and 9:02manage all of that and I think we can 9:05already see some of those Dynamics at 9:07play like currently in the west and the 9:10East we've sort of diverged on uh AI 9:14development paths and I think you know 9:16those things are going to continue as we 9:18get closer to those you know more 9:21powerful systems uh 9:24also also I think for example with 9:26medical advancements I don't want to 9:28make any proclamations if what he says 9:31is possible or not I don't think I'm a 9:33subject matter expert in that area um 9:37but it will depend then as well if 9:41companies are willing to make those 9:44advancements 9:46available uh at um to people who may not 9:50be able to afford them um currently and 9:53how that distribution is made uh among 9:56the population you know and then finally 9:59what I want to mention also is that um 10:01we talked a little bit about this 10:03information already and we'll talk about 10:05that later I think U but uh one thing he 10:09didn't mention in the article is 10:11education uh which is something uh I'm 10:14personally very hopeful for that uh more 10:19free access to information and high 10:22quality AI assisted education is going 10:25to be a big uplift uh for a lot of 10:28people and I think will also help this 10:31sort 10:32of uh um making our society sort of more 10:38democratic and more accepting of these 10:40Technologies because I think a lot of 10:42times when there is some conflict it's 10:46also because people don't have sort of 10:48the same basis to understand like the 10:52facts for example with vaccine anti 10:54antiv vaccination campaigns and things 10:56like that so I think it's a complex 11:00[Music] 11:04picture so I'm going to move us on to 11:06our next topic um one of the things I've 11:08been watching most carefully in watching 11:10the kind of X Twitter chatter on AI um 11:14is a bunch of hype around this repo 11:16called entropic effectively the story 11:19behind it is that it's an AI researcher 11:22that has introduced kind of a sampler um 11:25that effectively attempts to replicate 11:27some of the cool you know Chain of 11:28Thought Fe features in effect that we 11:31saw um for the open AI 01 release uh 11:34just a few weeks back um and I guess my 11:37I'll turn to you because you're gonna 11:39have to help me out here a little bit is 11:41what is a sampler anyways and why should 11:43we care oh I love this question um so 11:45yeah I spent quite some time focusing on 11:47llm inference so when we talk about AI 11:50we mostly mean large language models 11:52what a large language model does is 11:54given the start of a sentence so few 11:57words it would predict what is the next 11:58word so if I say on the table there is a 12:02automatically in your head there's a few 12:04probable words that pop up a t there is 12:06a book there might be a glass of water 12:08Etc so the model does something similar 12:10there's a statistical representation of 12:12all possible words that could come next 12:14and then there's a probability 12:16attributed to each word to the book to 12:18the glass Etc and all of these 12:21probabilities are based on the data it 12:23has seen in the past so the mo the these 12:25models are injected a lot of data and 12:27then based on what I've seen in in the 12:29past it kind of says most logically this 12:32is the next word that's going to come 12:34next so what a sampler does is it 12:36determines given x amount of words that 12:39the model has seen what should the model 12:42output next and the the sampling 12:44technique that's most widely used today 12:46is called greedy and by greedy we mean 12:50uh just outputting the the token or the 12:52word that has the highest statistical 12:55probability um so I hope I answered your 12:57question on what is sampling um I think 12:59this paper is really interesting and um 13:01takes advantage of additional 13:03information that um we can get uh out of 13:06large language models and out of the uh 13:09yeah acquisitional metadata that we have 13:11so I think it's an interesting paper and 13:13yeah happy to understand more about 13:15other people's thoughts on it yeah for 13:17sure and I guess maybe Kar I'll throw it 13:18to you is you know I think one of the 13:20most interesting bits about it is it 13:22introduces a new sampler um and I think 13:25the promise of it I think one of the 13:26people reason why people are so excited 13:27about it is like oh it really seems to 13:29boost the performance of these models 13:31against all these different types of 13:32tasks and I think the other interesting 13:34thing is that it seems to kind of like 13:36replicate in part like as I mentioned a 13:38little bit earlier like what open AI 13:40kind of touted as its special sauce for 13:42its new great model and I guess you know 13:44I'm sort of sitting here thinking like 13:46well you know open AI seems like you 13:49know the Goliath in the space because 13:50they can do all these crazy cool new 13:53algorithmic changes or improvements on 13:55their model but do you think that the 13:58existence of something like in Tropics 14:00means that like you know open source 14:02will almost be getting as good as fast 14:03as you know these kind of proprietary 14:05models and what these proprietary 14:07companies can do um you know it almost 14:09seems like maybe there actually is no 14:10Special Sauce because some random 14:12researcher can just launch this repo 14:13that that seems to do maybe something 14:15close to what these big companies can do 14:17yeah I totally agree with that and 14:19actually I love what in Tropics is doing 14:22I think they are having an Innovative 14:24approach here that reflects also this 14:26fast moving evolution of open source AI 14:29Community where new methods like these 14:31adaptive sampling are explored without 14:34requiring massive computational 14:36resources which is key here but also 14:38demonstrating also the collaborative and 14:40experimental nature of the field we can 14:42explore more you know in open source and 14:45kind of mimic or even even exceed you 14:47know what the secret sauce of you know 14:50these big companies are doing so of 14:53course entr Tropics aims to replicate 14:55you know some of the unique features 14:57associated with open AIS o1 model models 14:59particularly in the reasoning 15:00capabilities and they have you know this 15:02interesting ways of experimenting with 15:05entropy based and VAR they call also VAR 15:08entropy sampling techniques which kind 15:11of tries to reflect the uncertainty in 15:13the models next step or examines also 15:17the surrounding token landscape and 15:19helping the model decide if it should 15:22Branch or resample based on future token 15:25possibilities really interesting 15:26approach and I think at the end of the 15:29day open source is going to kind of 15:31catch up with what's happening a lot of 15:34innovation happening there and we see 15:35that not just in these algorithmic 15:37things but even with efforts like Triton 15:40for example on the on the uh GPU 15:43Hardware or the accelerator side there's 15:46a lot of work also happening in open 15:47source to kind of go to cou free or you 15:51know and you will see a lot of these 15:53things for example in the v Lambs or 15:57where what's happening in open source is 15:59kind of on par with with the some of the 16:01secret sauce that propriatary companies 16:04are doing in the space of AI across all 16:06the stacks what I think is also 16:08interesting is open source is is giving 16:12kind of the in all the ingredients for 16:14free and with approaches that are more 16:16accessible to everyone in the field so 16:18to explain my point what open AI did 16:21with oan is take a big Frontier Model do 16:23a lots of reinforcement learning in 16:25order to train it on how to do Chain of 16:28Thought reason in at scale what this 16:31open source repo did is take an open 16:33source model llama 3.1 and bypassed all 16:36this reinforcement learning that openai 16:38did and take and take advantage of an 16:41innovation or this additional 16:43information that you get at inference 16:45level so like Kar said um the the model 16:49has ways of of telling us that it's 16:52uncertain of the next token to predict 16:55so for certain situations you could see 16:57with high probability it's going to be 16:59this word but there might be Forks in 17:01the road where lots of different options 17:03are equally probable so taking advantage 17:06of this sort of information you could do 17:09a lot about it um in this repo they 17:11propos to do Chain of Thought or start 17:13from scratch but I'm actually quite 17:15interested in uncertainty quantification 17:17as a means of giving information and 17:20tools for people to use this models in 17:22different ways so if the model could 17:24tell you the answer is uncertain you 17:26could use that to build different 17:27systems to output something 17:29so I think the choice could be different 17:31than what uh this repo does but I do 17:33think it's an interesting research 17:34Direction yeah and I think that's such 17:36an interesting subtlety here is that 17:38it's not just kind of replicating the 17:39end results but this engineer seems to 17:42have basically found a way to do it a 17:43lot cheaper basically it's just like we 17:45just edit the sampler uh rather than 17:47having to do this completely complex 17:49kind of reinforcement learning uh 17:50process this is also encouraging like 17:52deeper reasoning through token control 17:55at inference time so it's kind of Paving 17:57the way also opening different like may 17:59also mentioned this figuring out ways 18:02how do we do these sampling these 18:04selections this at a much deeper and 18:07incorporating other informations about 18:09the uncertainty of the model about also 18:12the future predictions that you can do 18:13about the model to to do the right next 18:16steps so I think um this emerged as an 18:19like a joke uh in the last episode but 18:21I'm thinking about turning it into a bit 18:23for mixture of experts which is we got 18:25to talk about agents on every single 18:27episode it's just like part of what we 18:29need to do uh I guess my in particular 18:32you offered a question when we were 18:33talking about this episode before we 18:35were doing the recording about kind of 18:37the relationship between these types of 18:39uncertainty systems um and kind of like 18:42getting more agentic behaviors out of 18:44these models um do you want to talk a 18:46little bit more about that because I 18:47think that relationship is really 18:48interesting and it's not maybe entirely 18:50clear I think for for some folks who are 18:52not as deep on it as you are first of 18:54all any model can be any model of a 18:57certain size and that respond well to 18:59Chain of Thought kind of stepbystep 19:02thinking with thinking between quotation 19:05marks can be turned agentic now how well 19:08and how good that will perform is up to 19:11the inherent model um and it's it's 19:15performance on various benchmarks and 19:16then we're going to be talking about 19:17benchmarks in an upcoming session um 19:21what is interesting about this new 19:22innovation so taking advantage about 19:25information about 19:27uncertainty um I think this could be 19:29really interesting in the context of 19:30agentic systems because you can 19:33basically stop an agent in its tracks if 19:36it's uncertain of the next step and I 19:38think agents right in the agent World 19:40we're facing a lot of problems with 19:42reliability and actually users are over 19:45trusting the agent's performance because 19:47it looks like it's performing in a way 19:49that is human relatable so it's thought 19:52step by step there's a plan the plan at 19:54the high level seems reasonable actually 19:56catching hallucinations and an agentic 19:58approach roach is harder than just text 20:00in and text out so I think this is a 20:03uncertainty quantification is a tool 20:05that I think would be really important 20:07to bring agentic systems to the next 20:09level and I see it being used in 20:11multiple ways stopping an agent in its 20:13track maybe um based on the repo that 20:15we've seen maybe just starting again or 20:18starting a new Chain of Thought uh 20:19workflow so I we're at the very 20:22beginning of this but this is something 20:23that on my team we've been discussing as 20:25well uh as a really interesting research 20:27direction to inter into our work I think 20:29it kind of goes line in line with what 20:32the agentic approaches is doing because 20:35what in Tropics is doing it's 20:36introducing this entropy based sampling 20:39and with the v v entropy technique you 20:41know they're assessing future token 20:43distributions so and this is what you 20:45know also agentic system the behavior 20:48here requires foresight and planning and 20:51mimicking humanik flexibility and 20:53dynamic and that adaptive decision 20:55making so I think they're kind of go 20:57handin hand here and there's a lot here 21:00that can be learned from you know each 21:03way from the agentic systems you know 21:06they could incorporate those techniques 21:09uh to have this humanik flexibility and 21:12foresights or vice versa I think it's 21:15exciting um uh as the other two um 21:19panelists mentioned that uh there is 21:22this real push in open source which I I 21:25don't know how well we can quantify if 21:28it's cing up to sort of Frontier models 21:31or the efforts that you know those 21:33companies are doing but I think that's 21:35great that this is happening uh in the 21:37public yeah for sure and I think 21:38basically to what Maya said earlier I 21:40think we will see more of the kind of 21:42pattern that we see here which is it's 21:44possible that open source may be very 21:46clever about kind of solving the problem 21:49in a much more resource constrainted way 21:51which actually may keep it ahead of like 21:52the proprietary models and they're kind 21:54of like much more expensive approach to 21:56um some of these problems so definitely 21:58another Dynamic that we'll be returning 21:59to in future 22:02[Music] 22:05episodes so I'm going to turn us to our 22:07next topic um Apple released a paper 22:10that was of some controversy um recently 22:14um and uh I was joking a little bit 22:16earlier in the intro that they kind of 22:17are reigning on the AI parade um 22:20effectively what they did is they took a 22:21benchmark called GSM AK uh which 22:24contains a variety of mathematical 22:26reasoning questions and what they did is 22:29they said okay well we're going to do 22:30this we're going to make some quick 22:31variations to this Benchmark and create 22:33a new Benchmark which we call GSM 22:35symbolic um and these changes are very 22:37very small and subtle and don't really 22:39change the substantive nature of the 22:41mathematical problem so you could 22:43imagine kind of like a grade school 22:45question about you know John having 10 22:47apples and need to subtract three apples 22:49and add four apples and kind of what 22:51they're doing is they're saying okay 22:52well rather than John we'll talk about 22:54Sally and rather than Apples we'll talk 22:56about pears and maybe rather than 10 22:58apples the person will have 12 apples um 23:01and what they find is that these really 23:02kind of small changes can create some 23:04pretty significant drops in performance 23:07uh of the models against these 23:09benchmarks so on one level we kind of 23:12know this right which is that there's a 23:14bunch of overfitting on benchmarks and 23:16people are always kind of like gaming 23:17the benchmarks and models look better 23:19against these benchmarks but this is 23:21also kind of worrisome uh maybe Ruben 23:24I'll toss it back to you right because 23:25it sort of suggests that like maybe 23:27these models reasoning is actually 23:29nowhere near as strong as we we think 23:31they are they appear to be I don't know 23:33if you buy that conclusion yeah I mean I 23:36I think it makes okay first of all like 23:38it makes sense that people want to 23:40Benchmark uh models that get released 23:42and so I think there is an incentive for 23:45companies to also do well on those 23:48benchmarks uh because otherwise people 23:50are going to say oh okay this model 23:51isn't appreciably better than it was 23:54before uh and obviously public data will 23:58end up in training data for these models 24:01so I think I think that makes sense uh 24:03but when I looked at the figures uh in 24:06the paper I thought or I I saw like they 24:10have different sort of tests that they 24:12ran the models through uh one is like 24:14you mentioned they changed the names and 24:17maybe the figures or the objects uh and 24:20there was a drop I think between 0.3 and 24:249% or something like that uh but looking 24:27at sort of the more Frontier models I 24:31think the drop was not really that large 24:35uh in my opinion like I think for GPT 24:37for R it was only 3% or or something 24:41like that uh and then they 24:43had some other harder benchmarks where 24:47they added and removed conditions to the 24:51statements um or even added multiple 24:54conditions uh where there were much 24:57larger drops like I think up to 25:0040% for 40 mini I think I would have to 25:04look at the paper to get the exact 25:06figure I think it was up to like 25:0965.7% in one of the worst yeah and so I 25:12think even for sort of what we consider 25:15the frontier models you have a lot of 25:17drop there um but then you know when we 25:22have been talking about reasoning and 25:23Chain of Thought I think you saw that 25:26the 01 benchmarks dropped by 25:29substantially less it was still a lot it 25:30was like 177% or something so I'm not 25:35really sure how to feel about the 25:38results of this paper or what they mean 25:40or if this is a problem that will get 25:42resolved over time as reasoning gets 25:45better uh in these types of models or 25:47not yeah for sure and Kar you just 25:50chimed in there I don't know if you've 25:51got views on this paper and whether or 25:53not it's you know I guess you made it 25:55sound I don't want to put words in your 25:56mouth it's kind of like me big whoop 25:59right like we kind of know that these 26:00models have lower performance when you 26:01change the benchmarks and even then the 26:04effect doesn't seem that big and so 26:05maybe not too much to worry about um I 26:08don't know if C you feel the same way I 26:10think some of the results were 26:11surprising to me and this this work from 26:14these Apple researchers it kind of 26:16provided a very critical evaluation of 26:18the reasoning capabilities of flowers 26:20language models from what I saw they're 26:23kind of exposing that llms Reliance 26:26especially on P pattern matching 26:28is is Big right now rather than really 26:31true reasoning so because I I don't 26:35think that the LMS are really engaged in 26:37formal reasoning but instead they use 26:39sophisticated pattern recognition uh and 26:42this approach of course is very brittle 26:44and prone to failure with these minor 26:46changes that they have exposed um so for 26:50example if you look at the the GSM 26:51symbolic test performance so they 26:53created you know the variations like uh 26:56Ruben uh mentioned but with the you know 27:00and what they're seeing you know these 27:01drops uh sometimes can be very big if 27:04they just include irrelevant things to 27:07the problem the reasoning should stay 27:08the same but if you just say oh you know 27:11these uh apples for example some of them 27:13are smaller than others which is not 27:15doing anything you know to the reason in 27:18itself it's just additional irrelevant 27:20information but you know the lam was 27:22taking that and actually was taking the 27:25smaller apples and use that in the 27:27calculation so and another thing that 27:29they expose is the variations the 27:31inconsistent results across the runs so 27:34they showed very high variance between 27:37different runs with the same models 27:39which highlights also the inconsistency 27:40even slight changes in the problem 27:43structure resulted in accuracy drops up 27:46to 65% in certain cases so I think what 27:49the the key highlight here is the LMS 27:53they Tred to mimic reasoning but mostly 27:56relying on data patterns but their 27:58capability to perform consistent logical 28:01lening is still limited and the findings 28:04also suggest that current benchmarks May 28:06overestimate the reasoning capabilities 28:09of llms and and I think we need improved 28:12evaluation methods to really go the 28:14capabilities of LMS especially with 28:17respect to reasoning I'd love this new 28:19Benchmark that Apple put up I know we've 28:22been on uh previous podcast sessions 28:23where we talked about all the issues 28:25with benchmarks so I think this is a 28:27great step in the right Direction in 28:29order to force um more uh more 28:31generalizable insights based on 28:33benchmarks um I also think for me I it 28:36was really predictable that this was 28:38going to happen um whenever I talk about 28:39reasoning I like to say reasoning 28:41between quotation marks because it's us 28:44andromorph anthropomorphizing what we're 28:47seeing uh coming out of llms and like 28:49Kar said they're doing pattern matching 28:51so it's pattern matching at scale um 28:54they showed the model patterns it hasn't 28:56seen before so you could update the 28:57models log the models training with some 29:00new patterns and can infer can maybe 29:03unlock new use cases based on that 29:05that's great so it's it's a technology 29:07it's an imperfect technology but it can 29:08do useful things I don't think we're in 29:11a world where this current technology 29:12can do logical reasoning um it's just 29:15pattern matching at scale and I think we 29:17have to accept it for what it is and 29:19when we're thinking about making these 29:21systems useful I think we're always 29:22going to be in a scenario where there's 29:24going to be a human in the loop or on 29:26the loop we need to have we need to have 29:28ways of uh surfacing whether there's 29:31high confidence or low confidence in the 29:33llms trajectory so I think we have to 29:36use these tools and use this knowledge 29:38that it is an imperfect technology to 29:40make it more robust and there's a lot of 29:42papers that say take taking this sort of 29:44Technology with humans can increase the 29:47overall robustness of the system if we 29:49factor in a human as part of the system 29:52and I think we should accept that as 29:54opposed to thinking I think with the 29:56current technology we have we're on a 29:58Pathway to what is called AI yeah for 30:00sure and I'd love to make that with like 30:02very concrete with maybe a last question 30:04to Ruben you know right now we've talked 30:06about this in previous episodes there's 30:07a lot of excitement about say using AI 30:10to you know Harden computer networks 30:13right as like a complement to cyber 30:15security as a form of cyber security 30:18defense and I guess on the framework 30:20that Maya just laid out it is kind of 30:22interesting question is like is cyber 30:23security a pattern matching question or 30:25is it a is a reasoning question right um 30:28because I guess it would suggest here 30:29that if a lot of what we're doing in 30:31cyber security defense is just pattern 30:32matching well okay maybe the technology 30:34really has some very strong legs here 30:36but if something more is needed there's 30:38actually some really interesting 30:39questions about whether or not it's kind 30:40of fit for purpose just a final thought 30:42I'm curious about whether or not you you 30:43agree with that framing yeah I mean 30:46security is um vast and complex right 30:49doain and then in some cases there are 30:53like reasoning is very important but in 30:55other cases it's all about data 30:58collection correlating those data 31:00sources and summarizing and I think for 31:03many years already there has been use of 31:05sort of traditional machine learning uh 31:07in uh endpoint detection and response 31:10Solutions uh to great effect by the way 31:13just want to say that uh and then now 31:16with generative AI there's a lot of a 31:19lot of push to integrate that also um 31:23sort of into the back end where those 31:25events are correlated and maybe 31:28synthesized in a way that people had to 31:31do manually previously and sort of speed 31:34up those processes uh but humans are 31:36definitely involved there they have to 31:38be uh to evaluate those uh events but 31:41yeah I think it's going to be big yeah 31:43for our 31:44[Music] 31:48industry so we're going to end today's 31:50episode on more of a maybe stress 31:52inducing segment um as you know there's 31:54a big election coming up in the US and a 31:56big election uh coming up around the 31:59world um and uh open AI did a disclosure 32:03recently where they observed that you 32:05know they're seeing State actors um 32:08increasingly try to leverage AI for 32:11election interference and this involves 32:13using models for generating fake 32:14articles and fake social media content 32:16and other sort of persuasive tactics um 32:19which I think is is a really interesting 32:21development that finally you know the 32:23technolog is becoming you know mature 32:25enough um that you're sort of 32:26enterprising you know election interfer 32:29really wants to kind of Leverage this 32:30technology into the field um and I guess 32:33Ruben I'll start with you because you 32:34think a lot about kind of security and 32:36vulnerability in these types of systems 32:39um how what do you think about this I 32:41mean is it kind of an issue that we're 32:42going to just be able to solve at some 32:44point is it going to get worse or better 32:46over time I guess one of the really 32:47interesting things I'm trying to think 32:48about is like what's the trajectory of 32:50these types of Trends is this like do we 32:52just live in this world now or or is 32:54this a temporary thing um no I think we 32:56just live in this world okay this is my 32:58heartache um but um yeah I think um you 33:03know obviously AI has a lot of 33:05implications I think what I would 33:07categorize this as is social engineering 33:11uh and there are many varieties of that 33:13there might be persuasive messaging it 33:15might be persuasive generated images or 33:19videos um you know that's one category 33:23which I think where I think the risks 33:25are more immediately evident um you know 33:28there is another category where 33:31malicious actors are using AI to sort of 33:34speed up their malicious attacks uh 33:37where I think that is much less mature 33:40uh at this point uh but when I was going 33:43through open ai's um uh report on this 33:47uh and I think it's great that they're 33:48sort of being proactive and working with 33:51industry Partners I guess to sort of 33:53combat these threats uh as they appear 33:56it must be very new to them as well um 33:59my sort of conclusion was that they 34:03found that there was limited effect uh 34:06from what they saw uh and I think uh the 34:09most effective post was sort of a hoax 34:15post about uh an account on X where it 34:19replied a message that said oh you 34:21haven't paid your open AI Bill U but 34:24they said in the report that this wasn't 34:26actually generated by the the API so I 34:29think the impacts might still be limited 34:34uh but we may also be biased uh in that 34:37assessment because we're obviously 34:39looking only at um threats between 34:43quotation mark that we detected and 34:46stopped uh so it it wouldn't surprise me 34:48that there are actually much more 34:52successful influence campaigns in social 34:55media where we don't detect that because 34:58uh they are not behaving in a way that's 35:01uh sort of out of the ordinary or 35:03they're using self-hosted open source 35:06models to generate that so we don't have 35:08as much Telemetry on what they're doing 35:10and things like that yeah that's a 35:12little paranoia inducing thank you I 35:15think that's that's where we are yeah um 35:18May any thoughts on this I mean I you 35:20know I guess the obvious question is is 35:22there anything we can do to fix this or 35:23is this pretty much just like you know 35:26we're we're doomed to live in a world of 35:28you know fake AI influence operations 35:30all the time now yeah I think it's it's 35:33just the state of the world 35:35unfortunately so there are Bad actors 35:37when social media came about everyone 35:39was really exciting because it brought 35:41us all closer together it felt that we 35:43were all part of one big Global 35:45Community but for Bad actors this means 35:47bigger ski better skill bigger reach and 35:50I think that's the same thing with AI um 35:52I think the world is moving very fast 35:54and I do wonder about the ability of our 35:57society and the people who are putting 35:59their brain power towards solving these 36:01ISS issues about their ability to catch 36:03up with what's going on I think already 36:05in the school system I think we're 36:08already in a state that the school and 36:09educational system hasn't caught up to a 36:11post AI world and I wonder if in the 36:15field of keeping information factful um 36:19and how our society is organized whether 36:21we'll be able to get there um I do think 36:23it should be a concered effort and I 36:25think more uh global focus and public 36:28spending should be focused on these 36:30issues because we need more resources to 36:32catch up to where the technology is 36:34taking us I want to quickly jump in as 36:36well and say I think again I'm coming 36:38back to competing incentives here I 36:40think a lot of times it's not clear to 36:43me 36:44that social media platforms uh have the 36:47correct incentives to say okay actually 36:51we could deploy our own like AI systems 36:54to do like sentiment analysis and see 36:57which posts are promoting 36:59misinformation or giving harmful 37:02information to people or are clearly 37:05like part of some Network that is 37:08generating similar messages because if 37:11those messages generate a lot of 37:13interaction that might be good for those 37:15platforms so there is a problem with 37:18mislin incentives sometimes I think 37:20which is getting in our way as well yeah 37:22and I think that is actually a really 37:23important question is you know it's not 37:24just what the technology can detect but 37:26is it actually being implemented and and 37:28used and what reasons do people have to 37:30actually do that um Kar do you want to 37:33round us out on this one with a final 37:34comment I'm curious about how you think 37:36about these issues and um yeah if you 37:38think we're doomed yeah this is actually 37:40for me it's it's it's a scary 37:42state of course as the technology gets 37:45uh better more sophisticated especially 37:48gen AI these threats are all also going 37:51to get more sophisticated and more 37:53clever in how they can reach uh massives 37:57massive masses and then uh you know try 38:00to to do harm so of course you know to 38:04mitigate the misuse of AI models like 38:06those reported by open AI there is a lot 38:09that needs to be done uh things like 38:11robus AI detection tools how do we 38:13develop and deploy tools that detect AI 38:15generated content uh and also that we 38:19ensure you know fake materials 38:21regulation and oversight governments and 38:24the companies need to work together need 38:26to collaborate to set clear guidelines 38:28and policies for AI use and transparency 38:32and also I think user education is very 38:34important you know public awareness 38:36about AI generated misinformation to 38:39help people critically evaluate online 38:41content not just everything you see in a 38:43website or the internet or is something 38:46that you have to believe so you have to 38:48critically see the content and maybe 38:50figure out other sources is this really 38:52true or not and also I think partnership 38:55across across industry Corporation to 38:58share insights and prevent Mis I think 39:01increasing awareness about this is 39:03really important I mean open AI did some 39:06things clever ways to at least Identify 39:09some 20 operations that they said for AI 39:12uh for Content creation that they kind 39:14of halted and stopped that are focused 39:17on Election related Mis information so I 39:19think we need more of those but again 39:21like uh you know Maya and Ruben said 39:24this is the world we live in so and it's 39:27going to be an arms race as the 39:29technology gets better the threat's 39:31going to get more sophisticated so and 39:33again I want to say when I read open AI 39:36reports I find that the cases they 39:40highlight I would label as sort of low 39:43sophistication uh in many in sort of 39:47across like the different use cases or 39:49some properties of those campaigns that 39:52they detected so I wonder like with 39:56really good engineer ing efforts right 39:59like if there could be campaigns that 40:02it's not easy or possible to detect that 40:04they're happening so I think yeah I 40:06think this problem is just going to get 40:08yeah especially if they use proprietary 40:11models that you know outside the scope 40:14of open Ai and other Frontier 40:16models yeah what is the like yeah that's 40:19that's a really intriguing outcome that 40:21I haven't really considered as what's 40:22the what's the evil open AI right like 40:24is there an evil Sam Alman that's 40:26running a a criminal Foundation model 40:28like presumably yes right like I think 40:30that's definitely something that exists 40:32so you always know what you're going to 40:33get when you tune into mixture of 40:34experts uh we've gone from solving all 40:37infectious diseases and 20% GDP growth 40:39to uh Sinister invisible influence 40:43operations controlling you as we speak 40:45um so from the very good to the very bad 40:47of AI you'll always get it on mixture of 40:49experts um cter thanks for joining us uh 40:52Maya thanks for coming back on the show 40:53and Ruben we'll hope to have you on 40:55again sometime uh if you enjoyed what 40:57you heard you can get us on Apple 40:58podcast Spotify and podcast platforms 41:00everywhere and uh we will catch you next 41:02week here on mixture of experts