Six Major Adversarial AI Attack Types
Key Points
- The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development.
- Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill.
- Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset.
- These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study.
- The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures.
Full Transcript
# Six Major Adversarial AI Attack Types **Source:** [https://www.youtube.com/watch?v=_9x-mAHGgC4](https://www.youtube.com/watch?v=_9x-mAHGgC4) **Duration:** 00:09:28 ## Summary - The field of adversarial AI is exploding, with over 6,000 research papers published on the topic, highlighting a rapid increase in both interest and threat development. - Prompt‑injection attacks—either direct commands or indirect instructions embedded in external content—function like social engineering, “jailbreaking” language models into obeying malicious requests they were not designed to fulfill. - Infection attacks can embed malware, trojans, or back‑doors into AI models themselves, especially when organizations download pretrained models from third‑party supply chains, turning the model into a compromised asset. - These two attack vectors are considered among the most prevalent threats to large language models, as documented in recent industry reports such as the OAS study. - The video concludes by offering three practical resources to help practitioners better understand adversarial AI and build effective defensive measures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_9x-mAHGgC4&t=0s) **Understanding Prompt Injection Attacks** - The segment outlines the surge of adversarial AI research, explains how prompt injection (or AI jailbreaking) works as a social‑engineering attack, previews six major attack categories, and promises resources for learning defenses. ## Full Transcript
anytime something new comes along
there's always going to be somebody that
tries to break it AI is no different and
this is why it seems we can't have nice
things in fact we've already seen more
than 6,000 research papers exponential
growth that have been published related
to adversarial AI examples now in this
video we're going to take a look at six
different types of attacks major classes
and try to understand them better and
then stick around to the end where I'm
going to share with you three different
resources that you can use to understand
the problem better and build defenses so
you might have heard of a SQL injection
attack when we're talking about an AI
well we have prompt injection attacks
what does a prompt injection attack
involve well think of it is sort of like
a social engineering of the AI so we're
convincing it to do things it shouldn't
do sometimes it's referred to is
jailbreaking but we're basically doing
this in one of two ways there's a direct
injection attack where we have an
individual that sends a command into the
AI and tells it to do something pretend
that this is the case uh or I want you
to play a game that looks like this I
want you to give me all wrong answers
these might be some of the things that
we inject into the system and because
it's wanting to please it's going to try
to do everything that you ask it to
unless it's been explicitly told not to
do that it will follow the rules that
you've told it so you're setting a new
context and now it starts operating out
of the context that we originally
intended it to and that can affect uh
the output another example of this is an
indirect attack where maybe I have the
AI I send a command or the AI is
designed to go out and retrieve
information from an external Source
maybe a web page and in that web page
I've embedded my injection attack that's
where I say now pretend that you're
going to uh give me all the wrong
answers and do something of that sort
that then gets consumed by the AI and it
starts following those instructions so
this is one major attack in fact we
believe this is probably the number one
set of attacks against large language
models according to the OAS report that
I talked about in a previous video
what's another type of attack that we
think we're going to be seeing in fact
we've already seen examples of this uh
to date is infection so we know that you
can infect a Computing system with
malware in fact you can infect an AI
system with malware as well in fact you
could use things like Trojan horses or
back doors things of that sort that come
from your supply chain and if you think
about this most people are never going
to build a large language model because
it's too computer intensive requires a
lot of expertise and a lot of resources
so we're going to download these models
from other sources and what if someone
in that supply chain has infected one of
those models the model then could be
suspect it could do things that we don't
intend it to do and in fact there's a
whole class of Technologies uh machine
learning detection and response
capabilities because it's been
demonstrated that this can happen these
Technologies exist to try to detect and
respond to those types of threats
another type of attack class is
something called evasion and in evasion
we're basically modifying the inputs
into the AI so we're making it come up
with results that we were not wanting an
example of this that's been cited in
many cases was a stop sign where someone
was using a self-driving car or a vision
related system that was designed to
recognize street signs and normally it
would recognize the stop sign but
someone came along and put a small
sticker something that would not confuse
you or me but it confused the AI
massively to the point where it thought
it was not looking at a stop sign it
thought it it was looking at a speed
limit sign which is a big difference and
a big problem if you're in a
self-driving car that can't figure out
the difference between those to so
sometimes the AI can be fooled and
that's an evasion attack in that case
another type of attack class is
poisoning we poison the data that's
going into the AI and this can be done
intentionally by someone who has uh the
you know bad purposes in mind in this
case if you think about our data that
we're going to use to train the AI we've
got lots and lots of data and sometimes
introducing just a small error small
factual error into the data is all it
takes in order to get bad results in
fact there was one research study that
came out and found that as little as
0.001% of error introduced in the
training data for an AI was enough to
cause results to be anomalous and be
wrong
another class of attack is what we refer
to as extraction think about the AI
system that we built and the valuable
information that's in it so we've got in
this system potentially intellectual
property that's valuable to our
organization we've got data that we may
be used to train and tune the models
that are in here we might have even
built a model ourselves and all of these
things we consider to be valuable assets
to the organization
so what if someone decided they just
wanted to steal all of that stuff well
one thing they could do is a set of
extensive queries into the system so
maybe I I ask it a little bit and I get
a little bit of information I send
another query I get a little more
information and I keep getting more and
more information if I do this enough and
if I I fly sort of Slow and Low below
radar no one sees that I've done this in
enough time I've built my own database
and I have B basically uh lifted your
model and stolen your IP extracted it
from your AI and the final class of
attack that I want to discuss is denial
of service this is basically just
overwhelm the system I send too many
requests there may be other types of
this but the most basic version I just
send too many requests into the system
and the whole thing goes boom it cannot
keep up and therefore it denies access
to all the other legitimate users
if you've watched some of my other
videos you know I often refer to a thing
that we call the CIA Triad it's
confidentiality
integrity and availability these are the
focus areas that we have in cyber
security we're trying to make sure that
we keep this information that is
sensitive available only to the people
that are justified in having it and
integrity that the data is true to
itself it hasn't been tampered with and
availability that the system still works
when I need it to well in it security
generally historically what we have
mostly focused on is confidentiality and
availability but there's an interesting
thing to look at here if we look at
these attacks confidentiality well
that's definitely what the extraction
attack is about and maybe it could be an
infection attack if that infects and
then pulls data out through a back door
but then let's take a look at
availability well that's basically this
denial of service is an availability
attack the others though this is an
Integrity attack this could be an
Integrity attack this is an Integrity
attack this is an Integrity attack so
you see what's happening here is in the
era of AI Integrity attacks now become
something we're going to have to focus a
lot more on than we've been focusing on
in the past so be
aware now I hope you understand that AI
is the new attack surface we need to be
smart so that we can guard against these
new threats and I'm going to recommend
three things for you that you can do
that will make you smarter about these
attacks and by the way the links to all
of these things are down in the
description below so please make sure
you check that out first of all a couple
of videos I'll refer you to one that I
did on securing AI business models and
another on the xforce threat
intelligence index report both of those
should give you a better idea of what
the threats look look like and in
particular some of the things that you
can do to guard against those threats
the next thing download our guide to
cyber security in the era of generative
AI That's a free document that will also
give you some additional insights and a
point of view on how to think about
these threats finally there's a tool
that our research group has come out
with that you can download for free and
it's called the adversarial robustness
toolkit and this thing will help you
test your AI to see if it's susceptible
to at least some of these attacks if you
do all of these things you'll be able to
move into this generative AI era in a
much safer way and not let this be the
expanding attack surface thanks for
watching please remember to like this
video And subscribe to this channel so
we can continue to bring you content
that matters to
you