Learning Library

← Back to Library

Six Major Adversarial AI Attack Types

9m • Unknown Channel • security • tutorial • intermediate • Watch on YouTube ↗

Key Points

The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development.
Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill.
Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset.
These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study.
The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures.

Sections

00:00:00 Understanding Prompt Injection Attacks - The segment outlines the surge of adversarial AI research, explains how prompt injection (or AI jailbreaking) works as a social‑engineering attack, previews six major attack categories, and promises resources for learning defenses.

Full Transcript

# Six Major Adversarial AI Attack Types **Source:** [https://www.youtube.com/watch?v=_9x-mAHGgC4](https://www.youtube.com/watch?v=_9x-mAHGgC4) **Duration:** 00:09:28 ## Summary - The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development. - Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill. - Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset. - These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study. - The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_9x-mAHGgC4&t=0s) **Understanding Prompt Injection Attacks** - The segment outlines the surge of adversarial AI research, explains how prompt injection (or AI jailbreaking) works as a social‑engineering attack, previews six major attack categories, and promises resources for learning defenses. ## Full Transcript

0:00anytime something new comes along 0:02there's always going to be somebody that 0:03tries to break it AI is no different and 0:06this is why it seems we can't have nice 0:08things in fact we've already seen more 0:10than 6,000 research papers exponential 0:13growth that have been published related 0:16to adversarial AI examples now in this 0:19video we're going to take a look at six 0:21different types of attacks major classes 0:24and try to understand them better and 0:26then stick around to the end where I'm 0:27going to share with you three different 0:29resources that you can use to understand 0:31the problem better and build defenses so 0:34you might have heard of a SQL injection 0:36attack when we're talking about an AI 0:39well we have prompt injection attacks 0:41what does a prompt injection attack 0:43involve well think of it is sort of like 0:47a social engineering of the AI so we're 0:50convincing it to do things it shouldn't 0:52do sometimes it's referred to is 0:54jailbreaking but we're basically doing 0:56this in one of two ways there's a direct 0:59injection attack where we have an 1:00individual that sends a command into the 1:03AI and tells it to do something pretend 1:07that this is the case uh or I want you 1:09to play a game that looks like this I 1:11want you to give me all wrong answers 1:13these might be some of the things that 1:15we inject into the system and because 1:17it's wanting to please it's going to try 1:19to do everything that you ask it to 1:21unless it's been explicitly told not to 1:23do that it will follow the rules that 1:25you've told it so you're setting a new 1:27context and now it starts operating out 1:30of the context that we originally 1:31intended it to and that can affect uh 1:33the output another example of this is an 1:36indirect attack where maybe I have the 1:39AI I send a command or the AI is 1:41designed to go out and retrieve 1:43information from an external Source 1:45maybe a web page and in that web page 1:48I've embedded my injection attack that's 1:50where I say now pretend that you're 1:52going to uh give me all the wrong 1:54answers and do something of that sort 1:57that then gets consumed by the AI and it 1:59starts following those instructions so 2:01this is one major attack in fact we 2:03believe this is probably the number one 2:05set of attacks against large language 2:08models according to the OAS report that 2:11I talked about in a previous video 2:13what's another type of attack that we 2:15think we're going to be seeing in fact 2:17we've already seen examples of this uh 2:19to date is infection so we know that you 2:23can infect a Computing system with 2:25malware in fact you can infect an AI 2:28system with malware as well in fact you 2:31could use things like Trojan horses or 2:34back doors things of that sort that come 2:37from your supply chain and if you think 2:40about this most people are never going 2:41to build a large language model because 2:43it's too computer intensive requires a 2:45lot of expertise and a lot of resources 2:48so we're going to download these models 2:51from other sources and what if someone 2:54in that supply chain has infected one of 2:57those models the model then could be 3:00suspect it could do things that we don't 3:02intend it to do and in fact there's a 3:04whole class of Technologies uh machine 3:06learning detection and response 3:08capabilities because it's been 3:09demonstrated that this can happen these 3:12Technologies exist to try to detect and 3:15respond to those types of threats 3:18another type of attack class is 3:20something called evasion and in evasion 3:23we're basically modifying the inputs 3:25into the AI so we're making it come up 3:28with results that we were not wanting an 3:31example of this that's been cited in 3:33many cases was a stop sign where someone 3:37was using a self-driving car or a vision 3:40related system that was designed to 3:42recognize street signs and normally it 3:45would recognize the stop sign but 3:47someone came along and put a small 3:49sticker something that would not confuse 3:52you or me but it confused the AI 3:54massively to the point where it thought 3:57it was not looking at a stop sign it 3:59thought it it was looking at a speed 4:00limit sign which is a big difference and 4:03a big problem if you're in a 4:04self-driving car that can't figure out 4:06the difference between those to so 4:09sometimes the AI can be fooled and 4:11that's an evasion attack in that case 4:13another type of attack class is 4:16poisoning we poison the data that's 4:18going into the AI and this can be done 4:22intentionally by someone who has uh the 4:25you know bad purposes in mind in this 4:28case if you think about our data that 4:29we're going to use to train the AI we've 4:31got lots and lots of data and sometimes 4:34introducing just a small error small 4:37factual error into the data is all it 4:40takes in order to get bad results in 4:43fact there was one research study that 4:45came out and found that as little as 4:500.001% of error introduced in the 4:53training data for an AI was enough to 4:57cause results to be anomalous and be 4:59wrong 5:00another class of attack is what we refer 5:03to as extraction think about the AI 5:06system that we built and the valuable 5:09information that's in it so we've got in 5:12this system potentially intellectual 5:14property that's valuable to our 5:16organization we've got data that we may 5:18be used to train and tune the models 5:21that are in here we might have even 5:23built a model ourselves and all of these 5:26things we consider to be valuable assets 5:28to the organization 5:30so what if someone decided they just 5:32wanted to steal all of that stuff well 5:34one thing they could do is a set of 5:36extensive queries into the system so 5:38maybe I I ask it a little bit and I get 5:41a little bit of information I send 5:43another query I get a little more 5:44information and I keep getting more and 5:46more information if I do this enough and 5:49if I I fly sort of Slow and Low below 5:52radar no one sees that I've done this in 5:55enough time I've built my own database 5:58and I have B basically uh lifted your 6:01model and stolen your IP extracted it 6:04from your AI and the final class of 6:07attack that I want to discuss is denial 6:10of service this is basically just 6:13overwhelm the system I send too many 6:15requests there may be other types of 6:17this but the most basic version I just 6:19send too many requests into the system 6:21and the whole thing goes boom it cannot 6:24keep up and therefore it denies access 6:27to all the other legitimate users 6:30if you've watched some of my other 6:31videos you know I often refer to a thing 6:34that we call the CIA Triad it's 6:37confidentiality 6:39integrity and availability these are the 6:42focus areas that we have in cyber 6:44security we're trying to make sure that 6:46we keep this information that is 6:49sensitive available only to the people 6:52that are justified in having it and 6:54integrity that the data is true to 6:56itself it hasn't been tampered with and 6:58availability that the system still works 7:00when I need it to well in it security 7:03generally historically what we have 7:06mostly focused on is confidentiality and 7:09availability but there's an interesting 7:11thing to look at here if we look at 7:13these attacks confidentiality well 7:15that's definitely what the extraction 7:17attack is about and maybe it could be an 7:21infection attack if that infects and 7:22then pulls data out through a back door 7:26but then let's take a look at 7:27availability well that's basically this 7:29denial of service is an availability 7:31attack the others though this is an 7:34Integrity attack this could be an 7:36Integrity attack this is an Integrity 7:38attack this is an Integrity attack so 7:42you see what's happening here is in the 7:44era of AI Integrity attacks now become 7:48something we're going to have to focus a 7:49lot more on than we've been focusing on 7:51in the past so be 7:54aware now I hope you understand that AI 7:57is the new attack surface we need to be 7:59smart so that we can guard against these 8:02new threats and I'm going to recommend 8:05three things for you that you can do 8:07that will make you smarter about these 8:09attacks and by the way the links to all 8:11of these things are down in the 8:13description below so please make sure 8:14you check that out first of all a couple 8:17of videos I'll refer you to one that I 8:19did on securing AI business models and 8:22another on the xforce threat 8:24intelligence index report both of those 8:27should give you a better idea of what 8:28the threats look look like and in 8:30particular some of the things that you 8:31can do to guard against those threats 8:34the next thing download our guide to 8:38cyber security in the era of generative 8:41AI That's a free document that will also 8:43give you some additional insights and a 8:45point of view on how to think about 8:47these threats finally there's a tool 8:50that our research group has come out 8:52with that you can download for free and 8:54it's called the adversarial robustness 8:56toolkit and this thing will help you 8:59test your AI to see if it's susceptible 9:02to at least some of these attacks if you 9:04do all of these things you'll be able to 9:07move into this generative AI era in a 9:10much safer way and not let this be the 9:13expanding attack surface thanks for 9:16watching please remember to like this 9:18video And subscribe to this channel so 9:19we can continue to bring you content 9:21that matters to 9:25you