AI Hacking Surge Sparks Benchmark Reset
Key Points
- Amazon reported a surge in hacking attempts, jumping from 100 million to 750 million daily in six months, a rise attributed to generative AI tools that lower the technical barrier to launching attacks.
- Researchers at Stanford’s Center for Human-Centered AI note that large language models are now matching or exceeding human performance across many tasks, prompting a reset of evaluation benchmarks and the creation of harder tests that even experts can’t easily solve.
- As LLMs surpass human capabilities, the field will need new evaluation metrics designed to assess skills beyond what humans can currently demonstrate, ensuring continued measurement of AI progress.
- Daylight’s new tablet integrates a discreet AI “call button” that lets users summon a language model for contextual reading assistance without interrupting the reading flow, exemplifying a user‑focused, non‑intrusive AI interface.
Full Transcript
# AI Hacking Surge Sparks Benchmark Reset **Source:** [https://www.youtube.com/watch?v=DgYzkS80JdQ](https://www.youtube.com/watch?v=DgYzkS80JdQ) **Duration:** 00:04:32 ## Summary - Amazon reported a surge in hacking attempts, jumping from 100 million to 750 million daily in six months, a rise attributed to generative AI tools that lower the technical barrier to launching attacks. - Researchers at Stanford’s Center for Human-Centered AI note that large language models are now matching or exceeding human performance across many tasks, prompting a reset of evaluation benchmarks and the creation of harder tests that even experts can’t easily solve. - As LLMs surpass human capabilities, the field will need new evaluation metrics designed to assess skills beyond what humans can currently demonstrate, ensuring continued measurement of AI progress. - Daylight’s new tablet integrates a discreet AI “call button” that lets users summon a language model for contextual reading assistance without interrupting the reading flow, exemplifying a user‑focused, non‑intrusive AI interface. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DgYzkS80JdQ&t=0s) **AI Boosts Hack Attempts, Redefines Benchmarks** - Amazon reports a 7.5‑fold rise in daily hacking attempts—now 750 million per day—attributed to generative AI tools and autonomous LLM exploitation, prompting Stanford’s Center for Human Design to overhaul human‑baseline benchmarks as AI reaches parity even in math. ## Full Transcript
all right it's Cyber Monday and we're
all thinking about deals and that's not
what this is about Amazon suffers
750 million hacking attempts per day I
didn't know it was that big a number
either and it's up even for them so in
the last 6 months they have seen hacking
attempts go up from 100 million per day
to 750 million per day and they think
it's because generative AI is giving
hacking tool sets to people who would
previously have needed to learn computer
programming in order to execute
blackhead
attacks it's also possible that large
language models are at a point where
they can agentically and autonomously
exploit and uh begin attempting to hack
just by being given the instruction to
look for vulnerabilities we are at that
point in capability now I would not be
surprised if that was part of the story
too either way it's huge it's a 7.5x
increase in hacking attempts at a scale
of hundreds of millions in just 6
months then on top of that this is
underlining the capabilities of
llm Stanford center for human design
says that they are going to have to
reset their benchmarks previously
they've been using the 100% human
Baseline as a benchmark but now ai
capability across all the fields they're
measuring is converging on human capable
it's such a rate with math being the
last by the way math was apparently the
hardest uh but even math is now coming
up they're going to have to reset the
benchmarks and find harder tests and
they want to specifically find things
that uh enable us to measure human
capabilities in ways that we haven't
before and I think that's going to be
really interesting to follow they also
would like to see for the things that
they are currently measuring harder
tests done that humans can't do that
they can use to continue to evaluate the
capabilities of llms and this is going
to be an ongoing theme as large language
models surpass human capabilities we're
going to need to Define evaluations or
evals that are harder tests that measure
the capabilities of llms that maybe even
we can't do or maybe only a few of us
can
do third the daylight tablet is shipping
a really interesting user interface with
AI so what they're doing is their their
whole Focus their whole theme at
daylight is to keep you in flow and so
when you're reading they don't want to
distract you they are not a popup ad
company they're not a company that's
going to interrupt you with uh new
Temptations to Doom scroll they want you
to actually read and if you're reading
they don't want you distracted but they
want the power of contextual reading
available through large language models
and so they've actually decided to add a
call button on the tablet that allows
you if you want to understand the
passage better better to actually call
in a conversation with an llm and figure
out what the passage
means without changing the interface
without distracting you keeping it
completely invisible unless you call for
it with the call button I think it's a
really interesting approach to using an
llm without allowing the llm to contact
switch and distract you we'll see how it
goes last but not least have you asked
your llm to tell you about David Meyer d
a v d m a ye R this broke on Reddit uh a
few days ago no one can quite make sense
of why there's such a hardcoded block
here but it seems apparent that chat GPT
in particular will not tell you about
David Meyer and people are speculating
this is because chat GPT for whatever
reason doesn't want to talk about the
heir to the roths child Fortune but but
that's a little weird because none of
the other llms have this block Claude is
fine with
this
and I he's not recorded as an investor
it just it feels a little fanciful so I
have no idea why this is actually
happening it's kind of amusing feel free
to try to get chat GPT to write David
Meyer m a y r and if you do let me know
in the comments there's a few hacks that
have that have worked but I'll be
curious to see what you find