Learning Library

← Back to Library

AI Hacking Surge Sparks Benchmark Reset

Key Points

  • Amazon reported a surge in hacking attempts, jumping from 100 million to 750 million daily in six months, a rise attributed to generative AI tools that lower the technical barrier to launching attacks.
  • Researchers at Stanford’s Center for Human-Centered AI note that large language models are now matching or exceeding human performance across many tasks, prompting a reset of evaluation benchmarks and the creation of harder tests that even experts can’t easily solve.
  • As LLMs surpass human capabilities, the field will need new evaluation metrics designed to assess skills beyond what humans can currently demonstrate, ensuring continued measurement of AI progress.
  • Daylight’s new tablet integrates a discreet AI “call button” that lets users summon a language model for contextual reading assistance without interrupting the reading flow, exemplifying a user‑focused, non‑intrusive AI interface.

Full Transcript

# AI Hacking Surge Sparks Benchmark Reset **Source:** [https://www.youtube.com/watch?v=DgYzkS80JdQ](https://www.youtube.com/watch?v=DgYzkS80JdQ) **Duration:** 00:04:32 ## Summary - Amazon reported a surge in hacking attempts, jumping from 100 million to 750 million daily in six months, a rise attributed to generative AI tools that lower the technical barrier to launching attacks. - Researchers at Stanford’s Center for Human-Centered AI note that large language models are now matching or exceeding human performance across many tasks, prompting a reset of evaluation benchmarks and the creation of harder tests that even experts can’t easily solve. - As LLMs surpass human capabilities, the field will need new evaluation metrics designed to assess skills beyond what humans can currently demonstrate, ensuring continued measurement of AI progress. - Daylight’s new tablet integrates a discreet AI “call button” that lets users summon a language model for contextual reading assistance without interrupting the reading flow, exemplifying a user‑focused, non‑intrusive AI interface. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DgYzkS80JdQ&t=0s) **AI Boosts Hack Attempts, Redefines Benchmarks** - Amazon reports a 7.5‑fold rise in daily hacking attempts—now 750 million per day—attributed to generative AI tools and autonomous LLM exploitation, prompting Stanford’s Center for Human Design to overhaul human‑baseline benchmarks as AI reaches parity even in math. ## Full Transcript
0:00all right it's Cyber Monday and we're 0:02all thinking about deals and that's not 0:04what this is about Amazon suffers 0:08750 million hacking attempts per day I 0:12didn't know it was that big a number 0:13either and it's up even for them so in 0:17the last 6 months they have seen hacking 0:20attempts go up from 100 million per day 0:22to 750 million per day and they think 0:26it's because generative AI is giving 0:28hacking tool sets to people who would 0:31previously have needed to learn computer 0:33programming in order to execute 0:36blackhead 0:38attacks it's also possible that large 0:40language models are at a point where 0:42they can agentically and autonomously 0:45exploit and uh begin attempting to hack 0:49just by being given the instruction to 0:51look for vulnerabilities we are at that 0:53point in capability now I would not be 0:55surprised if that was part of the story 0:57too either way it's huge it's a 7.5x 1:01increase in hacking attempts at a scale 1:03of hundreds of millions in just 6 1:06months then on top of that this is 1:09underlining the capabilities of 1:11llm Stanford center for human design 1:15says that they are going to have to 1:16reset their benchmarks previously 1:18they've been using the 100% human 1:20Baseline as a benchmark but now ai 1:23capability across all the fields they're 1:25measuring is converging on human capable 1:29it's such a rate with math being the 1:32last by the way math was apparently the 1:34hardest uh but even math is now coming 1:36up they're going to have to reset the 1:38benchmarks and find harder tests and 1:40they want to specifically find things 1:42that uh enable us to measure human 1:45capabilities in ways that we haven't 1:47before and I think that's going to be 1:49really interesting to follow they also 1:52would like to see for the things that 1:54they are currently measuring harder 1:56tests done that humans can't do that 1:59they can use to continue to evaluate the 2:01capabilities of llms and this is going 2:03to be an ongoing theme as large language 2:05models surpass human capabilities we're 2:08going to need to Define evaluations or 2:10evals that are harder tests that measure 2:14the capabilities of llms that maybe even 2:16we can't do or maybe only a few of us 2:19can 2:20do third the daylight tablet is shipping 2:24a really interesting user interface with 2:26AI so what they're doing is their their 2:29whole Focus their whole theme at 2:30daylight is to keep you in flow and so 2:33when you're reading they don't want to 2:34distract you they are not a popup ad 2:36company they're not a company that's 2:38going to interrupt you with uh new 2:40Temptations to Doom scroll they want you 2:43to actually read and if you're reading 2:45they don't want you distracted but they 2:47want the power of contextual reading 2:50available through large language models 2:52and so they've actually decided to add a 2:54call button on the tablet that allows 2:57you if you want to understand the 2:59passage better better to actually call 3:00in a conversation with an llm and figure 3:03out what the passage 3:05means without changing the interface 3:08without distracting you keeping it 3:10completely invisible unless you call for 3:13it with the call button I think it's a 3:15really interesting approach to using an 3:17llm without allowing the llm to contact 3:21switch and distract you we'll see how it 3:23goes last but not least have you asked 3:27your llm to tell you about David Meyer d 3:31a v d m a ye R this broke on Reddit uh a 3:37few days ago no one can quite make sense 3:41of why there's such a hardcoded block 3:43here but it seems apparent that chat GPT 3:47in particular will not tell you about 3:49David Meyer and people are speculating 3:52this is because chat GPT for whatever 3:55reason doesn't want to talk about the 3:57heir to the roths child Fortune but but 3:59that's a little weird because none of 4:02the other llms have this block Claude is 4:05fine with 4:06this 4:08and I he's not recorded as an investor 4:11it just it feels a little fanciful so I 4:13have no idea why this is actually 4:17happening it's kind of amusing feel free 4:19to try to get chat GPT to write David 4:22Meyer m a y r and if you do let me know 4:26in the comments there's a few hacks that 4:28have that have worked but I'll be 4:29curious to see what you find