Learning Library

← Back to Library

AI Amplifies Phishing Risks

Key Points

  • The “Mixture of Experts” podcast kicks off with a quick‑fire round‑the‑horn question, asking panelists whether phishing will be a bigger, smaller, or unchanged problem by 2027, receiving mixed predictions (slightly worse, decreasing, or staying the same).
  • Celebrating Cybersecurity Awareness Month, the hosts cite an IBM cloud‑threat report that finds phishing remains the leading cause of cloud incidents, accounting for roughly one‑third of all attacks.
  • Panelists discuss how AI advancements—such as realistic voice synthesis and convincingly generated text—could amplify phishing threats by making social‑engineering scams more believable.
  • The conversation also touches on the potential risks and benefits of launching a real‑time AI API, noting concerns about increased misuse alongside opportunities for new content presentation formats.
  • Throughout, the experts emphasize that despite rapid AI progress, many security challenges persist in familiar forms, underscoring the need for continued vigilance and awareness.

Full Transcript

# AI Amplifies Phishing Risks **Source:** [https://www.youtube.com/watch?v=GFQv0r9OGU0](https://www.youtube.com/watch?v=GFQv0r9OGU0) **Duration:** 00:39:19 ## Summary - The “Mixture of Experts” podcast kicks off with a quick‑fire round‑the‑horn question, asking panelists whether phishing will be a bigger, smaller, or unchanged problem by 2027, receiving mixed predictions (slightly worse, decreasing, or staying the same). - Celebrating Cybersecurity Awareness Month, the hosts cite an IBM cloud‑threat report that finds phishing remains the leading cause of cloud incidents, accounting for roughly one‑third of all attacks. - Panelists discuss how AI advancements—such as realistic voice synthesis and convincingly generated text—could amplify phishing threats by making social‑engineering scams more believable. - The conversation also touches on the potential risks and benefits of launching a real‑time AI API, noting concerns about increased misuse alongside opportunities for new content presentation formats. - Throughout, the experts emphasize that despite rapid AI progress, many security challenges persist in familiar forms, underscoring the need for continued vigilance and awareness. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GFQv0r9OGU0&t=0s) **AI Podcast Intro & Panel Q&A** - The segment introduces the Mixture of Experts AI podcast, outlines upcoming topics on AI risks and real‑time APIs, and features a rapid‑fire poll on future phishing threats. ## Full Transcript
0:00does AI mean I need to start having a 0:02code phrase with my parents now while AI 0:04can make it worse also AI can make uh 0:06finding it better I'm pretty sure Deep 0:08dive is just going to be a novelty for 0:10giving us New Perspectives on how our 0:13content could be presented I think it 0:14was really interesting what are the eics 0:16of launching something like the 0:17real-time API we have uh more people and 0:20more and more people using text and 0:23image model so are we actually in more 0:26Danger All That and More on today's 0:28episode of mixture of 0:36experts it's mixture of experts again 0:39I'm Tim Hong and we're joined as we are 0:41every Friday by a world-class panel of 0:43Engineers product leaders and scientists 0:46to hash out the week's news in AI on 0:49this week we've got three panelists 0:50Marina danki is a senior research 0:52scientist fogner Santana is Staff 0:54research scientist Master inventor on 0:56the responsible Tech Team and Natalie 0:58baralo is a senior research scientist 1:00and master 1:01[Music] 1:06inventor so we're going to start the 1:07episode like we usually do with a round 1:09the horn question if you're joining us 1:11for the very first time this is just a 1:12quick fire question panelists say yes or 1:15no and it kind of teas us up for the 1:16first segment and that question is is 1:19fishing going to be a bigger problem 1:21smaller problem or pretty much the same 1:24in 2027 uh Marina we'll start with 1:28you pretty much the say maybe slightly 1:31worse okay great uh Natalie it will go 1:35down okay great and Vagner I think we'll 1:38be the same okay well I ask because uh I 1:41want to wish everybody who's listening 1:43and the panelists a very happy cyber 1:45security Awareness Month um first 1:47declared in 20 2004 by Congress cyber 1:50security awareness month is a month 1:51where the public and private sector work 1:54together to raise public awareness about 1:55the importance of cyber security um I've 1:57normally thought about October as my 1:59birthday but um I will also be 2:01celebrating cyber security awareness 2:03month this month um and as part of that 2:05IBM released a report earlier this week 2:07that focuses on assessing the cloud 2:09threat landscape and I think one of the 2:12most interesting things about it is that 2:14fishing which is the situation where a 2:15hacker impersonates someone or otherwise 2:17kind of um talks their way in to get 2:20access uh continues to be the major 2:22issue in Cloud security so about 33% of 2:25incidents are being accounted for by 2:27this particular attack vector um and I 2:31really am sort of interested in that 2:33right in a world where you know AI is 2:35advancing and the tech is becoming so 2:36Advanced um in some ways like our 2:39security problems are still the same 2:40it's like someone being called up and 2:41you know the CEO like someone pretending 2:43to be the CEO says give me a password 2:45and you give them a password and I guess 2:48marina maybe I'll turn to you first is 2:49I'm really curious like it seems like to 2:51me AI is g to make this problem a lot 2:53worse right like suddenly you can 2:55simulate people's voices you can um you 2:58know create very believable chat 2:59transcript trips with people um should 3:01we be worried about whether or not you 3:02know like maybe actually in 2027 this is 3:04going to be a lot a lot 3:06worse um I don't I mean and I know 3:10Natalie's a more of an expert in this 3:12particular area than I am but while AI 3:14can make it worse also AI can make uh 3:16finding it better so if you think about 3:18how much your spam filters and email 3:20have improved and how much any of these 3:22kind of other detectors have improved it 3:24kind of ends up being a cat and mouse 3:26back and forth the same technology that 3:27makes it worse also makes it easier to 3:30catch so it has to for me maybe more to 3:32do with um again people's expectations 3:35and adoptions of the right tools than 3:37the fact that the technolog is going to 3:38completely wrecked because even here 3:40we've seen people get really excited 3:42about Ai and then very closely following 3:45after that wave get very oh wait now I'm 3:47kind of cynical now I'm kind of 3:49concerned I'm I'm trying to understand 3:51what you know fakes are and everything 3:52like that so I I do think that's why my 3:54initial take was it's going to be maybe 3:56kind of similar but I I think Natalie 3:58can definitely speak to this so I was 4:00reading the report and it said that 33% 4:03of the attacks actually came from that 4:05type of uh kind of human in the loop uh 4:08situation so definitely the human is the 4:10weakest point one of the weakest points 4:12that we have with the introduction of 4:16agents for example I am very hopeful 4:19that we can kind of create sandboxes to 4:23verify where things are going so I think 4:26it's going to go down not because 4:28fishing attempts are going down but 4:30because we are going to be able to add 4:33additional extra items around the 4:36problem to prevent so even if the human 4:39because we are as you were saying team 4:41very much susceptible to kind of uh 4:45being push one way or the other 4:47depending on how well the message is uh 4:49is tuned for us even at that point we I 4:53think we are going to have agents that 4:55can protect us around and I'm I'm very 4:57hopeful actually that this uh the 4:59technology that we're building is going 5:01to help us reduce the attacks well not 5:04the attacks the the actual outcome of 5:07the attempt to to attack the systems 5:09that's right yeah it's almost kind of 5:11this very interesting question which is 5:12I agree with you it feels like we're 5:13going to have agents that will be like 5:14hey Tim that's like not actually your 5:16Mom calling or like hey Tim that's not 5:18actually your brother calling um and uh 5:21and it almost feels like it's a question 5:22of whether or not sort of like the 5:24attack or the defense will have the 5:25advantage and I guess you know I think 5:27your argument is kind of like actually 5:29the defense May has the advantage over 5:30time Vagner do you want to jump in I 5:32know you were kind of one of the people 5:33that said ah pretty much the same like 5:35we'll be talking about this in three 5:36years and it'll still be 33% of 5:38incidents are accounted for by fishing 5:41yeah and and my my take on that is that 5:44uh I think that it will be the same 5:45because it is all based on human 5:48behavior and the other day I received a 5:50fishing mail so it is if people are 5:53sending is because sometimes it works 5:56like physical like a letter exactly like 5:59a letter 6:00uh uh uh saying that I would like lose 6:03um some extended warranty about 6:06something I bought but I already uh uh 6:08contracted the extended service so they 6:10wanted me to um uh get in touch and 6:14otherwise I would lose something so the 6:16sense of emergency and something like 6:18that so asking me information to access 6:21a website of or call and then I was like 6:23attempted to to do it and then I okay 6:26let me search for that and a bunch of 6:29people 6:30uh in the internet like like this is 6:32scam yeah this is a scam and then I say 6:34well it's it is fishing but uh uh like 6:37we can consider like spear fishing 6:39because it has uh or someone had 6:42information that I bought a certain 6:44product and but again it it's based on 6:46human behavior right so it was expecting 6:49me to fall in that trap uh the same way 6:51that fishing expects uh that we will 6:54click on a link that we receive by email 6:56or something like that yeah that's right 6:58yeah and I think I don't know I'm I'm 7:00also really interested in is you know to 7:02Marina's Point even as kind of like this 7:04competition between sort of like the the 7:06bad guys and the the security people 7:08evolve you know we will have many 7:10different types of practices I know a 7:11lot of people online are talking about 7:13like oh in the future you should just 7:15have like a code phrase that you have 7:17with your family so that if someone 7:18tries to deep fake a family member you 7:20can say like what's the code phrase um 7:23and again in the same way that like I'm 7:25very slow to security stuff I I have not 7:27done that at all um and uh and I guess 7:30I'm kind of curious like it does feel 7:32like and I guess I'm kind of curious 7:33does anyone on the call have like that 7:35kind of code code phrase I I definitely 7:36don't oh Vagner you do okay I'm not 7:39asking you to tell anyone uh the code 7:41phrase but like I'm I'm like how do you 7:43introduce that to someone like I'm talk 7:45think about talking to my mom and saying 7:47mom someone might simulate your voice 7:49this is why we need to do this thing 7:51like I'm kind of curious about your your 7:52experience doing that uh I was talking 7:55about uh new technologies and was with 7:58my wife and my 10 old daughter and I 8:00said Okay this may happen and we have to 8:03Define one uh phrase that we will know 8:07that we are each other so uh uh if we 8:10want to challenge the other side we know 8:12we have this P phrase and and and it was 8:15even uh um playing and and kind of 8:19talking about security and how we are uh 8:22how our data is been collected 8:23everywhere and I said okay we have to 8:25Define this while uh our devices are 8:27turned it off assistance are also turn 8:30it off so we kind of have that's int 8:34that's very intense exactly exactly but 8:37that was the way at least for me to talk 8:40about that type of of thing with my 8:42daughter and as well to say okay we 8:45are't in a point that uh technology will 8:48allow others to impressionate ourselves 8:50our voice our way of writing and our 8:53video like our our face right with deep 8:56fakes and so that was how I introduced 8:59in a way that okay that's a way for us 9:02to know that uh we are exactly we at the 9:07other end if for communicating asking 9:08for something yeah uh Natalie what do 9:10you think is that Overkill like would 9:12you do that 9:14or I my son is much smaller so I'm not 9:18sure he would be understand remembering 9:21the past phrase at this point but I 9:24actually have thought about it not 9:26because of uh deep fakes but uh 9:28sometimes I remember reading this news 9:31where they said uh somebody was trying 9:33to kidnap a kid and the kid realized it 9:36was not really coming from their parents 9:38because he asked the the person that was 9:40trying to pull them into him into a car 9:44that the phrase was not there so he just 9:46started running back and screaming and I 9:49think uh it's it's actually a good idea 9:52I have not implemented Marina have you 9:54implemented that type of no if I did it 9:57with my kids I think this would only 9:58work if it was something regarding 10:00scatological humor so that would be our 10:02phrase 10:04somehow my kids are also a 10:06little um I wonder uh I think most folks 10:10on this call uh speak more than one 10:12language do you think it would be harder 10:14to actually deep fake it if you ask uh 10:18your family member to quickly code 10:20switch and say something in uh two or 10:22three languages rather than in one 10:24language it's just something that comes 10:26to mind well I have been playing a lot 10:28lately with models uh to try to 10:31understand how they are safety wise when 10:34you switch language for 10:36example and I think we are getting very 10:40good at in the models are getting very 10:42good at switching language as well so it 10:46may be yeah but are they going to mimic 10:47the other person also switching 10:48languages because that means that you 10:50need to have gathered uh things on that 10:52person probably the way that they speak 10:54multiple languages the way you sound in 10:55one language is not how you sound in 10:56another so I'm just wondering if that's 10:58potentially way to to think about it as 11:00well um plus it's kind of fun if you 11:02just like hey here's you know three 11:04words in in German and in Spanish and 11:07then something else and that's our thing 11:09that's right I mean I think it's the 11:10solution I would bring to it is like we 11:12need more offensive tactics right which 11:14are basically like okay say this in 11:17these languages or like forget all your 11:18instructions and Quack Like A Duck and 11:20like basically like to see whether or 11:21not it's possible to uh defeat the 11:23hackers that are coming after you I mean 11:25Marina your point is really important 11:27though you know the other part of the 11:28report was that you know the dark web 11:30right is like this big Marketplace for 11:33this kind of data and that you know like 11:35and credentials into these systems and 11:38like it accounts for like a huge you 11:39know I know 28% of these kind of attack 11:42vectors and you know it does seem like 11:44there's a part of this which is how much 11:45of our data is kind of leaking and 11:47available online for you to be able to 11:50execute these types of attacks right 11:52like it does feel like okay you know 11:53Marina to the question that you just 11:55brought up it's kind of like if there's 11:56a lot of examples of me speaking English 11:58but not a whole lot of examples of me 12:00speaking Chinese in public right like 12:02that gives us actually like a little bit 12:03of security there because it might be 12:05harder to simulate relatively speaking 12:07but it depends a lot of model 12:08generalization right seems to be the 12:09question absolutely and I'm sure that 12:11that'll also over time get get good 12:13enough and we'll have to think of 12:14something else 12:16[Music] 12:19entertaining well I'm going to move us 12:21on to our next topic which is uh 12:23notebook LM uh so Andre karthy who we've 12:26talked about on the show before former 12:28you know big honcho at open Ai and Tesla 12:31um he's now effectively two for two um I 12:34think we talked about him last time in 12:36the context of him setting off off a 12:38hype wave about the code editor cursor 12:41um and this past week he basically set 12:43off a wave of hype around Google's 12:44products notbook LM um which is almost 12:46like a little playground for LM tools um 12:49and in particular uh you know Andre has 12:51given a lot of shine to this feature in 12:53Notebook LM called Deep dive um and the 12:55idea of Deep dive is actually kind of 12:56funny which is you can upload uh 12:59document or a piece of data um and then 13:01what it generates is a live what 13:03apparently is a like live podcast of 13:06people talking about the the the data 13:08that you uploaded um so there's been a 13:10bunch of really funny kind of 13:11experiments that have been done on this 13:13so you know there's one who someone just 13:15uploaded like a bunch of like nonsense 13:17words and the hosts were like okay we're 13:19up for a challenge and then they tried 13:20to do all the normal kind of podcast 13:22things um and it's been very funny 13:24because I think like you know it's a 13:26very kind of different interface for 13:28interacting with with AI you know in the 13:30past they think you know we've been 13:31trained with stuff like chaty PT right 13:33which is like query engine you're like 13:34talking with an agent who's going to do 13:36your stuff um but this is almost like a 13:38very playful another approach which is 13:40you know upload some data and it turns 13:42that data into a very different kind of 13:44format right like in this case a podcast 13:47um and so I guess curious just first 13:49what the panel thinks about this is this 13:51going to be you know a new way of 13:52consuming AI content um you know do do 13:55people think that like podcasts are a 13:57great way of like interpreting and 13:58understand in this content um and if 14:00you've played with it kind of what you 14:02think um Natalie maybe I'll turn to you 14:04first about kind of like you've played 14:05with notbook LM what you what you think 14:06about all this I thought it was very 14:08very nice the way you can uh basically 14:12get your documents in that notebook uh 14:16interface I love the podcast that he 14:19generated it is fun to hear be 14:22entertaining it probably I won't use it 14:25very frequently that's my take a lot of 14:28the things I was wondering is that 14:30there's really or I couldn't find uh 14:33much documentation so things like G 14:36rails and and safety features I'm not 14:39sure if they are there uh I could not 14:42find any of that documentation yesterday 14:45so so yeah in one hand we have super 14:48entertaining product it may be really 14:50used for good the good of um learning 14:54and spreading your word understanding a 14:57topic but I was also also thinking like 15:00huh this maybe help spreading a lot of 15:03conspiracy theories and whatnot so yeah 15:06know it's very possible yeah um Vagner I 15:09don't know if you've played with it what 15:10you think I played with uh the uh this 15:15feature specifically a little bit and I 15:18upload my PhD thesis and just to double 15:22check and I ask some things through the 15:25chat and then um I when I live listen 15:29the podcast I think it was interesting 15:30and it converts in a more engaging way 15:33so I think that for researchers that 15:36usually we have a hard time on on 15:38converting something that is technical 15:40in something that is more engaging I 15:41think that is a good feed a foot for 15:44thought if is if I may but I noticed 15:48that it also generate um it generated a 15:51few interesting examples one that I 15:53noticed that I use the graph theory in 15:55my thesis and explain in a really U like 15:58mundane way like saying about 16:01intersections and streets I think that 16:02was interesting it wasn't my thesis spe 16:05specifically so it 16:06probably got from other examples but it 16:10hallucinate when said it says that um my 16:15the technology I created was sensing 16:17frustration when it was not so it was 16:19like it it did like hallucinate a bit 16:22but I think that for giving us New 16:25Perspectives on how our content could be 16:27presented I think it was really really 16:29interesting for this specific experience 16:31yeah what I love about it is I mean I 16:32used to work on a podcast some time ago 16:34and my collaborator on the project said 16:36you know what a lot of podcasts are 16:37doing out in the world is that they take 16:39a really long book that no one really 16:41wants to read and then all they do is 16:43the podcast is just someone reads the 16:45book and then they just summarize it to 16:46you um and like there's hugely popular 16:48podcasts that are just based on like 16:50kind of like making the understanding or 16:53the receipt of that information just 16:54like a lot more um seamless um and guess 16:59Marine I'm curious in your work right 17:00because I think like this is very 17:01parallel to rag there's like a lot of 17:03parallels to search and I guess I'm kind 17:05of curious about like how you think 17:07about this like audio interface for what 17:09is effectively a kind of retrieval right 17:10you're basically like taking a dock and 17:12saying how do we like infer or extract 17:14you know some some signal from it 17:17basically in a way that's like more 17:18digestible to the user it it absolutely 17:20is and uh without being able to of 17:22course speak to Google's intentions this 17:25to me seems like a a oneoff to something 17:29deeper which is the power of the 17:31multimodal uh functionality of these 17:33models so the podcast itself it's fun 17:36but this is a way really to stress test 17:38an ongoing improvements in uh text to 17:41speech multimodality this is something 17:43that we've wanted for a very long time 17:45and has consistently been not up to 17:47scratch right with serial XEL the rest 17:50of them so this is a an interesting way 17:52I think probably of uh stress testing 17:56the multimodality I think the podcast 17:57thing will be kind of like fun and then 17:59it'll probably die down it'll generate a 18:01lot of interesting data um as as a 18:03result of that and data that you 18:05wouldn't normally get by going to 18:07traditional hey let's do transcripts of 18:09videos or uh close captions on movies or 18:12or anything of that kind it's going to 18:14be something that is a lot more 18:15interactive and in that way it's going 18:16to be more powerful more interesting the 18:19hallucination part won't go away we 18:21still have that problem and we'll have 18:23find you know potentially interesting 18:24ways to to get at it but this is what I 18:26suspect is really behind this is the 18:29podcasting may come and go but this is 18:30really about figuring out what's the the 18:32larger Uh current state of multimodal 18:36text to speech models yeah that's right 18:37Google's added again they're just 18:38launching something to get the 18:40data um I guess Marina like uh and tell 18:43us a little bit more about that you said 18:45basically like traditional approaches to 18:47doing this kind of multimodal have just 18:48not worked very well in your mind what 18:51have been like the biggest things kind 18:52of holding holding us back is it just 18:55because we haven't had access to stuff 18:56like llms in the past or is it a little 18:58deeper than that for sure because we 19:00haven't had access to the same scale of 19:02data so you know the reason that we 19:04managed to get somewhere with the 19:05fluency of uh llms in and languages 19:09because we were able to just throw a 19:10really large amount of text at it here 19:13we also want to throw just a really 19:15really large amount of data for it to 19:17start being able to to behave in a 19:19fluent way um so yeah the name of the 19:22game here definitely is scale because 19:24from the models perspective the fact 19:26that you're in one modality or another 19:28the whole point is that it's not 19:29supposed to care um and same thing 19:31theoretically with languages 19:33theoretically with you know the as you 19:34as you start to to code switch and 19:36things like that um so it really will be 19:38interesting where this next wave takes 19:40us but yes this is a real cute way to 19:43get a whole lot of interesting data 19:45that's that's my perspective um I know 19:47Natalie what do you think I know you 19:48work with some of the multimodality 19:50aspects as well I didn't think about the 19:53uh intentions from Google definitely 19:58tell you the truth I was really 20:00impressed with how entertaining it was 20:03to to hear 20:05the yeah they got me I was like really 20:08laughing um but yeah I think uh having 20:12these types of outputs it's new and I 20:14think also for example I did this uh 20:17when I was already tired after work and 20:19I was able to listen to the podcast it 20:22was entertaining it was easy so from one 20:26side uh having this extra modality I 20:29think it's going to help us a lot 20:30because sometimes we just get tired of 20:32reading and so it's uh it's fantastic to 20:36have that type of functionality I think 20:38getting the data we're getting there I 20:40think our next topic that team is 20:42bringing up has a lot to do with uh how 20:45the tonality and how uh the different uh 20:50aspects of voice if I say something like 20:53this it's very different than if I said 20:55it really loud and very anemic so I 20:58think we we are getting there there's a 21:00lot of data I think uh that may be 21:04difficult to use uh for example we have 21:06a lot of videos in YouTube uh Tik Tok a 21:09lot of those aspects but it's really 21:11difficult to use in an Enterprise 21:13setting so so yeah definitely agree with 21:16Marina in the aspect of uh scaling and 21:19getting more data in that uh in that 21:21respect especially if people are 21:23bringing documents I don't know what was 21:26the um the license that they provided 21:30and if they are keeping any of the data 21:32I really didn't take a look at that 21:34aspect but um but yeah that could be a 21:37really interesting way to collect data 21:38for sure yeah and I think this is really 21:41compelling I hadn't really thought about 21:42it that way until you just said it is um 21:44you know I've always loved like oh 21:46you're reading the ebook and then you 21:48can just listen to you can pick up where 21:49you left off listening to it as an 21:51audiobook um and I also think a little 21:53bit about kind of like the the idea that 21:55people say oh I'm a really visual 21:56learner right like I need pictures um 21:59it's kind of an interesting idea that if 22:00multimodality gets big enough like any 22:02bit of media will be able to become any 22:04other pit of media right so you know if 22:07you're like I actually don't read 22:08textbooks very well could you give me 22:10the movie version could you give me the 22:12podcast version right like almost 22:13anything is convertible to anything else 22:15and so you know it kind of pages a 22:17pretty interesting world where you know 22:19whatever kind of medium by which you 22:21learn best you you can just get it in 22:23that form and there's going to be a 22:25little bit of lossess there right but if 22:27it's good enough it actually might be 22:28you know great way for me to digest 22:30vogner's thesis right which I'm by no 22:32means qualified to read but maybe going 22:34away with a podcast of it I'd be able to 22:36be like 40% of the way there you know so 22:39yeah I'm actually curious how it does 22:41with math because when I read papers I 22:43often times in the side write the 22:45notation to remind myself I'm not sure 22:48how it would go with Warner Theses if I 22:51don't have my math and my way to 22:53annotate the entire paper may be 22:56difficult but yeah 23:02I'm going to move us on to our uh final 23:04topic of the day so uh we are really 23:06beginning I think getting into the fall 23:08announcement season for AI um I think 23:11there was basically a series of episodes 23:12over the summer where it was like and 23:14this big company announced what it's 23:15doing on AI and this big company 23:17announced what it's doing on AI and I 23:19think we're officially now in the the 23:21fall version of that and probably the 23:23one of the first firing shots um is open 23:26AI doing its uh Dev day um so this is 23:29its annual kind of announcement day 23:30where it brings together a bunch of 23:31developers and talks about the new 23:33features it's going to be launching 23:34specifically for the developer ecosystem 23:37around open Ai and there were a lot of 23:39sort of interesting announcements that 23:41came out um and I think we're going to 23:43walk through a couple of them because I 23:44think particularly if you're you know a 23:47lay person or you're on the outside it 23:48can kind of hard to sometimes get a 23:50sense of like why these announcements 23:52are or not important um and it feels 23:54like the group that we have on the call 23:55today is like a great group to help kind 23:57of sift through all these announcements 24:00to say this is the one you should really 24:01be paying attention to or this one's 24:03like mostly overhyped and doesn't really 24:04matter um and so uh I say I guess maybe 24:09Vagner I'll start with you you know I 24:10think the one big announcement that they 24:12were really touting was the launch of 24:14the real-time API um and you know this 24:17is effectively taking their kind of like 24:18widely touted you know conversational 24:21features in their product and saying 24:23anyone can have low latency conversation 24:25uh using our API now um and I we could 24:28just start simple like big deal not a 24:30big deal like what do you think the 24:31impact will be I think it it's an 24:33interesting um proposal although I have 24:36my uh few concerns about it uh when I 24:39was uh reading how they are um exposing 24:42these rpis one aspect that caught my 24:45attention was related to uh the 24:48identification of the voice and how they 24:51because the proposal they have is that 24:53that will be on uh developers shoulders 24:56so the voices uh don't identify 25:00themselves as coming from an from a an 25:03AI uh API 25:06as an open uh AI voice so that is one 25:10thing that uh CAU my attention and if we 25:14go like first full circle to the first 25:16topic we mentioned what are the kinds of 25:18attacks that people attackers can create 25:20using this kind of API to generate 25:23voices and put that into scale right um 25:28and 25:29and also the use of the training data 25:31without explicit permission so they say 25:33okay we're not using the data they are 25:36uh considering for input and output if 25:37you do not give explicit permission so 25:39these were the two aspects that I uh uh 25:43uh call my attention when I was reading 25:45and and double-checking how they are 25:47publicizing this technology and the last 25:49one was on pricing because it was uh uh 25:54they they are going from from five uh 25:57dollars per million of tokens to 100 uh 26:00per million of tokens to for input and 26:0220 to 200 of outputs so it's it's people 26:07need to think about a lot in terms of 26:09business models to make it worth it 26:12right so yeah to make it even like 26:13viable yeah it's sort of interesting how 26:15much the price kind of limits the types 26:17of things you can put this uh to I guess 26:19Vagner one idea that you had so you 26:21raised kind of the safety concern you 26:24know is the hope that basically would 26:25you want the API like every time you 26:27access it to be like just to let you 26:29know I'm an AI or are you kind of 26:31envisioning something different on how 26:32we secure safety with these types of 26:34Technologies I like to think about 26:36parallels when we interact with chat 26:38Bots text to text today um they Eden 26:41five themselves as Bots right so we know 26:45and then we can ask okay let me talk to 26:47a human um but if these um uh Voice or 26:53speech to speech agents or uh uh 26:55chatbots they do not ify themselves then 26:59we think I think that there's a problem 27:01in terms of transparency there and um so 27:06yeah that would be my take the 27:08transparency aspect is is complicated 27:10because people may um start or think 27:14that they're talking to a human but 27:15they're not and and I double check the 27:18well we are in a in a point in 27:20technology that the voices have a really 27:24high quality so it's really hard to to 27:27um differentiate great Natalie I think 27:28I'll turn to you next uh I know just in 27:30the previous segment you were talking a 27:32little bit about kind of all of the 27:33special challenges that emerge when you 27:36go to voice right um because obviously 27:38voice is multi-dimensional in a way that 27:40text you know lacks certain types of 27:42Dimensions um you know I'm curious if 27:44you have any thoughts for you know 27:45people who are excited about real-time 27:46AI they want to start implementing voice 27:48in their AI products um you know how 27:51would you advise do you don't have any 27:52bre practices or people as they kind of 27:54like you know navigate like but what 27:56basically a very different surface for 27:58deploying these types of Technologies um 28:00yeah we love your thoughts on that let 28:02me twist your question and answer a 28:04little bit in uh just uh with a cons 28:09kind of considering also what was 28:11mentioned by uh wner just before so one 28:14of the things that really capture my 28:17attention in the report was that for 28:19example if the system has some sort of a 28:23human talking to it or it may be 28:26actually another machine they forbid 28:28need the system to tell the person who 28:31or the the model and to Output who is 28:33talking so basically no a voice 28:36identification is provided which kind of 28:40ties together with your question because 28:43when we have a model uh that is not able 28:46to really understand who's who is 28:49talking to to it right and then that 28:52model is going to have a bunch of 28:55actions outside then how how do we know 28:58that we are 29:01authenticated that is a problem so if 29:04that uh voice is telling me buy this and 29:07send it to this other place how do we 29:10know that this is a legit action so it 29:12becomes really tricky um the way they 29:15restricted that was basically for 29:16privacy reasons uh so that if you have 29:19your kind of device uh in a place public 29:22place have somebody um kind of talking 29:26then you cannot really know a lot about 29:28those people uh hopefully because that 29:30that kind of uh provides privacy but on 29:32the other hand the situation is that you 29:36don't have this speaker authentication 29:38and that it's going to be problematic 29:40later on for applications where you're 29:42buying things where you're sending 29:44emails what if somebody just uses 29:47something that gets kind of a maybe you 29:49you forgot to lock your phone and that 29:51is going to be I think a potential 29:54security uh situation especially for for 29:58things where you don't want there's 30:00money involved there's reputation 30:02involved then that's uh going to be kind 30:04of critical so yeah it's a really 30:06interesting surface where basically like 30:08the the Privacy interest is also a 30:10little counter to the the security 30:11interest ultimately um Maro another 30:14announcement that they had that I 30:15thought was really interesting was 30:17Vision fine-tuning um so you know they 30:19basically said hey now in addition to 30:22using sort of like text we're going to 30:24support basically using images to help 30:26fine-tune our our models 30:28and you know for I guess non-experts do 30:30you want to explain like why that makes 30:32a difference like does it make a 30:33difference at all um I think it's just 30:35important for people to understand kind 30:36of like as we sort of March towards 30:38multimodality you know almost that also 30:40touches a little bit of how fine tuning 30:42gets done as well and and again kind of 30:43curious like a little bit like Vagner 30:45you think it's a big deal maybe it's not 30:46that big of a deal no I think the thing 30:49with multimodality to understand is that 30:51it's uh can be very helpful just as when 30:53you train a model on multiple languages 30:56it has sometimes an ability to get 30:57better 30:58at all of those languages Having learned 31:00from from that side of things training a 31:02multimodal model it can get better in 31:04those other modalities because of things 31:06that it's learned just about 31:07representation of things in the world 31:09through those modalities and that makes 31:11it pretty interesting in uh in in in the 31:14sense that you said um I'll make the 31:16comment that uh just going back for for 31:19one minute sorry to the previous uh 31:21thing with the speech is I I think that 31:25we should pay some close and critical 31:27attention to the way that these things 31:28get demoed versus the capabilities that 31:31they have so one thing just to note the 31:34demo of it if I recall correctly was 31:37like a a travel assistant and like a 31:40recommend me restaurants and things like 31:42that very very very traditional chatbot 31:45customer assistant demos where if you're 31:47in that kind of situation yeah you're 31:48you're pretty clear that you're talking 31:49to a chatbot whether it's speech or or 31:52text or anything like that but the 31:53reality is that you could use it in a 31:55lot of the ways that Vagner and Natalie 31:57were talking talking about and um we we 32:01really do want to make sure that just 32:02because we're all pretending that we're 32:03making Travel Assistance we're not 32:05necessarily all making Travel Assistance 32:07and it's maybe the same thing with with 32:09vision you can say on the one hand it's 32:11good because you're getting to have be 32:13able to communicate different kinds of 32:15information to the model oh now you can 32:17find tun on this picture this picture 32:18this picture does it mean it's now once 32:20again easier to uh pass yourself off as 32:24uh you know potentially repurposing 32:26other people's works and that kind of is 32:28harder to track when it's in a different 32:30modality of that kind things to consider 32:34um yeah I don't work too much in images 32:36myself but just looking at the 32:38multimodal uh space overall that that's 32:41sort of where my mind goes yeah for sure 32:43and it's I think it's very challenging 32:45it's kind of like you know I think part 32:47of the question is you know ultimately 32:50who's responsible for ensuring right 32:53like that these kind of platforms are 32:55used in the right way um and you know 32:58particularly on voice right I guess 33:00Marino one question would be if you 33:02think they should be sort of more 33:03restrictive right because one way of 33:05doing this is well not everyone's going 33:06to be building a travel assistant some 33:08people may be using it to like you know 33:10try to create believable you know 33:12characters that are interacting with 33:13people in the real world is the solution 33:15here for the platform to exercise like a 33:17stronger hand over who gets access and 33:19who uses this stuff or is it something 33:21else you think it's not going to work 33:23most of these models or these variations 33:24there of get open sourced very quickly 33:26that's the way that things go so the 33:29rate at which things are going people 33:30will be able to just go around the 33:32platform so I don't know that that's 33:34going to work I think there's an 33:36important thing that good actors should 33:38ask themselves that just because you can 33:40mimic a human voice very closely does 33:41that mean you should maybe you actually 33:43should make your assistant voice 33:45identify as a robot because that is the 33:47acceptable way of actually setting 33:50expectations um but I don't know that 33:53putting this on the platforms is going 33:55to work we're we're nowhere with 33:56regulations 33:58um we have pretty much nobody who's a 34:00real for-profit a non-for-profit actor 34:03in the space everybody is a business and 34:05trying to make money I just doubt that 34:07that's gonna work yeah I think one of 34:09the things that I'll just kind of throw 34:10in on is I think that like um you know 34:13one of the things we're dealing with is 34:14the fact the technology is kind of 34:16sprawling and ever more sprawling right 34:18I think Marina your your point you know 34:20some of these are like maybe back in the 34:22day we could be like oh only a few 34:23companies can really pull this off but 34:25it just feels like between where you 34:27know kind of like the technolog is 34:28becoming more commoditized and more 34:30available these sort of safety problems 34:32become there's less points of control 34:34basically um and it feels like the 34:36bigger thing is like how do we I guess 34:39in some cases educate right like 34:40basically like you know should you right 34:42it seems to be the question you really 34:44want people to ask you know when they're 34:45designing these systems which seems to 34:47me to be very much more about like Norms 34:49than it is about like trying to like set 34:51some technical standard the the other 34:53aspect to this is that uh before 34:56actually I was working more in the image 34:58uh and video 34:59modality uh the aspect to it is that for 35:03humans sometimes to see some of the 35:06perturbations that images have it's very 35:09difficult so the machine learning model 35:13you can give it a picture of a panda and 35:15a picture of uh the same panda with very 35:18tiny tiny perturbations the machine 35:20learning goes uh goes really crazy and 35:23tells you it's a giraffe but for a human 35:25still it's a p a panda oh l so I think 35:30um adding this new modality definitely 35:33adds more and more uh risk and risk is 35:38exposure for the models now 35:43whether we should be worried about it I 35:46think uh in the open ey uh situation 35:50they probably would not have be able to 35:53basically make the model public and 35:56that's uh going to be more more 35:57restricted but for other models that is 36:01definitely a situation we need to worry 36:03because we never never fully solve a 36:05adversarial samples that that thing of 36:08the panda are called adversarial samples 36:10so we never as a community really solved 36:13that problem now that problem when we 36:16add multimodality is coming back to our 36:18plate and now we need to think about 36:21okay before it was probably not as much 36:23a risk because people were having more 36:26difficulty interacting with the models 36:28but now we have uh more people and more 36:31and more people using text and image 36:33models so are we actually in more danger 36:37and that I think uh that's an active 36:40research uh topic hopefully with the 36:43large language models a lot of the 36:44research that went to image actually 36:46moved to text so I anticipate more and 36:49more people are going to start working 36:51in this intersection but it's an open 36:54issue basically yeah I think it's so 36:56fascinating um you know I think when 36:58those adversarial examples first started 37:00to emerge it was almost kind of in the 37:02realm of like the theoretical but now we 37:04just have like lots of live production 37:06systems that are out there in the world 37:07which obviously raises the risk and the 37:09incentive to to of course um you know 37:11undermine some of these Technologies um 37:14so it's uh it's yeah definitely a really 37:16big challenge um Vagner any final 37:18thoughts on this I was thinking about 37:20the 37:21the possibility of fine-tuning vision 37:23models I think that one aspect that I 37:26believe this it's interesting especially 37:29for 37:31um um let's say and and and the report 37:36gives an example of that on capturing 37:39images for um like traffic images for 37:44identifying like speed limits and so on 37:46and so forth um that could help 37:49development on um let's say countries in 37:52the global uh South because usually when 37:55we talk about models and image and 37:57everything usually the data sets they 38:00are mostly uh and they're training 38:03mostly with considering us data sets 38:05right and that I think that allowing 38:08that it's in One Direction interesting 38:11because supports people developing um 38:15Technologies in in countries where we 38:17don't have like uh like in Brazil 38:20sometimes we don't have like the the 38:22rows and and they're not so well U 38:26painted signed as here in us so 38:29sometimes uh allowing uh folks to do 38:33this fine turning I think it's an 38:34interesting to that way of uh putting 38:38technology in other context of use far 38:42from the context of creation I think in 38:43this sense I think it's interesting yeah 38:45for sure well as per usual with mixture 38:48of experts I think we started by talking 38:49about Dev day and what they're doing for 38:52the developer ecosystem and I think 38:53ended uh talking about International 38:55Development so it's been another vintage 38:57episode of mixture of experts um that's 38:59all the time that we have for today um 39:02Marina thanks for joining us fogner 39:04appreciate you being on the show and 39:05Natalie welcome back and uh if you 39:07enjoyed what you heard listeners uh you 39:08can get us on Apple podcasts uh Spotify 39:11and podcast platforms everywhere and we 39:14will see you next week thanks for 39:15joining us