Learning Library

← Back to Library

ChatGPT Atlas Sparks AI Debate

Key Points

  • The episode of “Mixture of Experts” introduces a panel of AI experts (Martin Keane, Aaron Botman, and Abraham Daniels) who will discuss ChatGPT Atlas, future AI agents, Deepseek’s DeepSeq OCR paper, and whether LLMs can suffer “brain rot.”
  • In the news roundup, major players such as Goldman Sachs, IBM‑Grok, the military, and Uber are all expanding AI initiatives—financing data‑center projects, combining high‑speed inference with enterprise tools, using chatbots for rapid decision‑making, and crowdsourcing model training to drivers.
  • OpenAI’s launch of ChatGPT Atlas, a built‑in web browser, is framed as a logical evolution of its search features and a way to integrate browsing history for a more seamless, personalized internet experience while navigating antitrust pressures on Chrome.
  • The discussion highlights Andre Krapathy’s forward‑looking projections on AI agents, signaling that agent‑centric architectures may become a dominant paradigm in the next wave of AI development.
  • The panel also raises a provocative question about “brain rot” in large language models, prompting debate on long‑term model degradation and the need for continual updating and maintenance.

Sections

Full Transcript

# ChatGPT Atlas Sparks AI Debate **Source:** [https://www.youtube.com/watch?v=xawn4C43TWo](https://www.youtube.com/watch?v=xawn4C43TWo) **Duration:** 00:44:37 ## Summary - The episode of “Mixture of Experts” introduces a panel of AI experts (Martin Keane, Aaron Botman, and Abraham Daniels) who will discuss ChatGPT Atlas, future AI agents, Deepseek’s DeepSeq OCR paper, and whether LLMs can suffer “brain rot.” - In the news roundup, major players such as Goldman Sachs, IBM‑Grok, the military, and Uber are all expanding AI initiatives—financing data‑center projects, combining high‑speed inference with enterprise tools, using chatbots for rapid decision‑making, and crowdsourcing model training to drivers. - OpenAI’s launch of ChatGPT Atlas, a built‑in web browser, is framed as a logical evolution of its search features and a way to integrate browsing history for a more seamless, personalized internet experience while navigating antitrust pressures on Chrome. - The discussion highlights Andre Krapathy’s forward‑looking projections on AI agents, signaling that agent‑centric architectures may become a dominant paradigm in the next wave of AI development. - The panel also raises a provocative question about “brain rot” in large language models, prompting debate on long‑term model degradation and the need for continual updating and maintenance. ## Sections - [00:00:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=0s) **Predicting Atlas Adoption & AI Trends** - In the opening of the Mixture of Experts podcast, host Tim Huang and panelists discuss whether OpenAI’s Atlas will achieve large‑scale adoption, preview upcoming segments on agents, DeepSeq OCR, LLM decay, and share AI‑industry news including Goldman Sachs financing AI data centers and an IBM‑Groq partnership. - [00:03:15](https://www.youtube.com/watch?v=xawn4C43TWo&t=195s) **OpenAI's AI Browser Adoption Debate** - The speakers evaluate OpenAI's new AI-powered browser, weighing its strategic benefits against high‑friction user transition challenges and comparing it with rivals such as Perplexity's Comet. - [00:08:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=480s) **Future of Browsers in an AI‑Driven Web** - The speaker debates whether browsers will become obsolete as AI agents and conversational platforms evolve into the primary interface for accessing and orchestrating internet content. - [00:11:14](https://www.youtube.com/watch?v=xawn4C43TWo&t=674s) **OpenAI's Vision as OS** - The speakers speculate that OpenAI aims to evolve beyond a browser into a low‑level operating system that can integrate with any desktop application, positioning its AI as a universal, fix‑all tool for everyday users. - [00:14:22](https://www.youtube.com/watch?v=xawn4C43TWo&t=862s) **Karpathy Critiques AI Agent Promise** - The hosts debate Andrej Karpathy’s warning that today’s AI agents lack sufficient intelligence and multimodality, questioning whether this signals a slowdown in the technology’s advancement and widespread adoption. - [00:19:17](https://www.youtube.com/watch?v=xawn4C43TWo&t=1157s) **Navigating AI Hype and Benchmarks** - The speaker reflects on past AI winters, stresses the importance of realistic benchmarks and human oversight to temper over‑optimistic expectations in emerging AI systems. - [00:23:00](https://www.youtube.com/watch?v=xawn4C43TWo&t=1380s) **Production-Grade Agents and Generative Computing** - The speaker stresses that only agents with near‑perfect reliability can be production‑grade, viewing generative computing as the key to achieving such deterministic outcomes before moving on to recent research like the DeepSeq OCR paper. - [00:28:25](https://www.youtube.com/watch?v=xawn4C43TWo&t=1705s) **Vision Encoder for Efficient Document Understanding** - The speaker describes a two‑stage system where a vision encoder (Deep Encoder) converts scanned PDFs into compressed textual tokens, mitigating LLM context‑length constraints and enabling flexible decoding for various multimodal language models. - [00:33:26](https://www.youtube.com/watch?v=xawn4C43TWo&t=2006s) **Visualizing LLM Context Windows** - The discussion speculates on turning LLM context windows into AI‑generated visualizations, explores multimodal bridges between language and perception, and references a tongue‑in‑cheek paper titled “LLMs can get brain rot!” that illustrates these ideas. - [00:36:33](https://www.youtube.com/watch?v=xawn4C43TWo&t=2193s) **Model Degradation and Inertia Analogy** - The speaker warns that continual deployment with increasingly shallow training data can cause “brain rot” in LLMs, likening it to adult cognitive inertia and proposing interventions such as virtual lesioning or pruning of super‑weights to restore plasticity. - [00:40:46](https://www.youtube.com/watch?v=xawn4C43TWo&t=2446s) **Short-Form Social Media Data Confounds Findings** - The speaker argues that observed narcissistic or adversarial traits in a study may stem more from the brief, platform‑specific nature of X/Twitter posts than from the content itself, highlighting how short‑form format and platform culture act as confounding variables and exemplify a garbage‑in‑garbage‑out issue for models. ## Full Transcript
0:01What's your predictions? Do you think people are going to 0:02adopt Atlas at scale? This is going to be a 0:04big winner for them. That's a good question because, I 0:06mean, I can see the benefit to OpenAI. What's the 0:09benefit to us as the user? All that and more 0:12on today's Mixture of Experts. I'm Tim Huang and welcome 0:19to Mixture of Experts. Each week, Moe brings together a 0:21panel of brilliant, funny, thoughtful panelists to debate, discuss and 0:25think through the latest news in artificial intelligence. Joining us 0:28today are three incredible panelists. So a very warm welcome 0:30to Martin Keane, who is master inventor, Aaron Botman, IBM 0:34fellow and master inventor, and Abraham Daniels, who's a senior 0:37technical Product manager for Granite. There's lots to talk about 0:40today. We're going to talk about ChatGPT Atlas, Andre Krapathy's 0:43projections about the future of agents. We'll talk about an 0:47interesting paper out of Deepseek on Deep Seq ocr. And 0:49then finally, we'll ask the question of whether or not 0:51LLMs can get brain rot. But first, here's ILI with 0:54the news. Hey everyone, I'm Illy McConnell. I'm a tech 1:02news writer with IBM Sync. I'm here with a few 1:04AI headlines you might have missed this busy week. The 1:07AI race is long underway, and now Wall street wants 1:11in. Banking giant Goldman Sachs has created a new team 1:14that is focusing on financing deals to build data Centers 1:17and other AI projects. IBM and Grok have teamed up 1:21to combine Groq's high speed inferencing with IBM's AI agent 1:27tools so enterprises can deploy AI age more quickly and 1:31cost effectively. It's no longer just companies experimenting with AI. 1:36Now the military wants in too. Even top generals look 1:40to AI chatbots for answers as they practice making decisions 1:43quickly, a critical skill in the battlefield. Uber drivers can 1:48now earn a little extra cash between rides by doing 1:50small digital tasks that help train Uber's AI models. So 1:55it's a side hustle within a side hustle. Want to 1:58dive deeper into some of these topics? Subscribe to the 2:00Think Newsletter Linked in the Show Notes and now back 2:03to the episode. First off, I really wanted to talk 2:10about the big product announcement of the week, which is 2:13ChatGPT Atlas. So if you missed this news, OpenAI is 2:17now out with its own browser. And we've talked about 2:20this in the past, but I guess, Abraham, do you 2:22want to give us some intuition for, like, why is 2:24OpenAI in the browser game at all? Like, why are 2:28they. Why are they doing what they're Doing. Couple answers 2:30to that. I think there's been kind of breadcrumbs in 2:34terms of ChatGPT or OpenAI entering this space with search 2:40functionality dropping last year, as well as them really being 2:45the entry point to a lot of users in terms 2:48of how they navigate the Internet. So I think, one, 2:51it was a natural kind of pivot for them. But 2:53two, with a lot of the antitrust with Chrome, as 2:57well as the idea that you could have your history 3:02cached as part of your Internet experience so that you 3:05have a better navigation experience when using the Internet, I 3:08think it just makes perfect sense for them. Model development 3:12is not necessarily as hot as it used to be. 3:15So I think OpenAI has been really diligent in terms 3:17of finding new avenues to capitalize on their user base, 3:22you know, which is, you know, over 350 million people. 3:24So I, I think it's. Personally, I think it's a 3:27really smart move. And I think, you know, with, with, 3:31with LLMs kind of being, they're already, you know, an 3:33entry point for most people in terms of how you 3:36use the Internet. It just makes sense for them to 3:38actually, you know, you know, create a browser. I think 3:41these transitions are really hard. I remember like when I 3:44moved from like basic Chrome to Brave, it was like 3:48moving house. I like felt like it was like, took 3:51a long time. They're both chromium browsers and so they're 3:54like actually share a very similar kind of DNA. But 3:57like this like transition from like one browser to another 4:01is like really feels quite high friction. And I guess, 4:05Martin, if. Do you have any thoughts on kind of 4:06like adoption here? Right, because the other one that we've 4:08talked about in the past is Perplexity's Comet browser, which 4:11is like their bid in the space and there it 4:14almost seems like, oh, well, if you have a company 4:16that sells AI as search, it'd become very obvious for 4:20you to do AI browser. Right. Because of the kind 4:23of history of Google and Chrome, I guess. What's your 4:27predictions? Do you think people are going to adopt Atlas 4:29at scale? This is going to be a big winner 4:30for them. Yeah, that's a good question because I mean, 4:32I can see the benefit to OpenAI. What's the benefit 4:35to us as the user? Yeah, good question. Yeah. So 4:41can I share two stories of how I've been using 4:42Atlas this week? For sure. When I tested it out, 4:45so I installed this on my Mac and then the 4:48first thing I wanted to do is I had a 4:50scientific article that I just kind of open up in 4:53Atlas. And of course, now you can bring up the 4:55tab along the side which gives you access to ChatGPT 4:58and you can ask questions. And the questions use the 5:01Open Web pages context. So I could ask questions about, 5:04tell me what the method and the purpose and the 5:07findings were of this experiment. And it gave me all 5:09that information. I mean, that stuff I could have done 5:11easily by just going into a regular ChatGPT window, right, 5:14and just pasting in the URL. But it was kind 5:16of handy that it was there. But also in this 5:19scientific article, there were a bunch of pictures. So I 5:22wondered, could I start asking questions about the pictures and 5:26not reference any particular pictures, just see if it could 5:29figure out which one was appropriate? So I should say 5:33the article was about a scientific study of a beer 5:36brewing method. And I wanted to ask, did one of 5:41the beers look more oxidized than the other? Which you 5:43can sort of tell because they get a little bit 5:44darker in color. So all I said is, did one 5:47look more oxidized than the other? And it found the 5:50one image that was actually related to that, and then 5:53it analyzed the image and it told me, actually, no, 5:55I can't see any difference. So it worked. It worked. 6:04right there. I didn't have to have two windows open. 6:06Now, the second thing that I tried was it has 6:08built in agent mode, where it's supposed to be able 6:12to basically control the browser for you, fire up a 6:15bunch of tabs, do a load of stuff. So I'm 6:17a bit of an amateur book collector, and there's one 6:21Michael Connelly book that I don't have yet. So I 6:24was like, I wonder if the agent can find me 6:26the book. So I asked it, I said, look, I 6:29want to find this particular book. It's called Nine Dragons 6:32by Michael Connolly. I want it to be in hardcover 6:36binding. I want a used copy, and the used condition 6:39needs to be very good. So it goes off and 6:41it's searching a bunch of websites and you see it 6:43kind of working. And it's got some fun animations in 6:45Atlas. And it came back with an answer and it's 6:48popped up the window with the one that it thinks 6:51is best fit, and it found the right book. But 6:55I looked at it and it had the description of 7:02that the front cover of the book didn't look right 7:04to me. It didn't look like any of the other 7:05Michael Connelly books that I'd collected. So all I said 7:09was, this cover doesn't look right. And what it did 7:12is it went and fired off a bunch more Windows 7:15as part of the agent. And this time it looked 7:19up the ISBN number and then it confirmed that that 7:21is the correct picture for that ISBN number. But then 7:24it pointed out this is the UK version of the 7:27book, not the USA version of the book. And you 7:29would actually need to use this other ISBN number if 7:31you want me to search for that. So again, this 7:33was like a really good example, right? Yeah. But the 7:35agent did all of the work for me. And again, 7:38I probably could have done that in my chrome browser 7:41using chatgpt.com, but I would have been kind of flipping 7:44between multiple tabs to do that. So it was beneficial 7:47to me just to kind of have it all there 7:48in one place. Uh huh. So you're pro. You actually 7:52are. Like, you feel like a month from now you'll 7:54still be using Atlas? We'll see. I don't know. We'll 7:57see. Okay, Day one, I liked it. All right, great. 8:01Aaron, I've saved maybe the craziest question for you for 8:04last. I'm sort of interested in like whether or not 8:08in five to 10 years there will even be browsers. 8:12Right. Like, you know, I think one way of kind 8:15of reading the rise of chatbots is, well, if they 8:17get good enough or these agents get good enough, you'll 8:20never need to go directly to a website anymore. Right. 8:23All of the information will be curated, assembled, you know, 8:27maybe everything will be working on mcp. And so like, 8:30you really will have an Internet that is for agents, 8:33by agents. And so the notion of like you having 8:35to browse the web is, is maybe like this artifact 8:39of the past. And so I guess one question for 8:41you is like, over the long run, do browsers even 8:43make sense as like a category of product? Yeah, you 8:47know, you know, so, so taking out the crystal ball, 8:48you know, and just thinking about the projection of tech 8:51is going. Yeah, I mean, it's, it's fascinating, you know, 8:55because, you know, the paradigm's changing. You know, I think 8:58that OpenAI, they're looking at turning our computer, our computing 9:02devices into a playground, but it doesn't yet have control 9:05over the structure and function of that playground, at least 9:08yet. Right. So we're trying to preserve some privacy pieces. 9:14And it looks like ChatGPT is trying to become more 9:16like an operating system where you can come and use 9:19These different applications. It's like an operating system for AI 9:23apps, where in this case, the OS role here is 9:27more about orchestrating AI tools, workflows, plugins and such. So 9:32it's not going to replace macOS or Windows or Linux 9:35or so on. You know, it's not aiming to act 9:37like a low level OS that controls this kind of 9:39hardware, but it's abstracted up a level, you know, where 9:43it handles apps, SDKs, third party apps, agents. Right. And 9:48the line between apps and platform, you know, it's beginning 9:51to blur a bit, you know, and we have to 9:54think too, that our computers really aren't built for AI 9:57per se. We have to farm out lots of these 10:00models, powerful big models and even the agent wrappers up 10:05into the cloud or these big compute clusters. Right. And 10:09so we need something new. Right? And so this is 10:12where I think, you know, generative AI and generative computing 10:16combined together will help us achieve sort of the future 10:20of what's going to happen. I think some of the 10:24risks that we all just need to be aware of 10:26is data and privacy, you know, that, you know, just 10:29making sure that we still can control and decide, you 10:32know, what this new os, right, is going to do 10:36and what it can do. Right. There could also be 10:39these hidden prompts or what we call comet jacking, where 10:44there's a lot of these agent risks, right. That could 10:46happen and it just sort of does it by itself, 10:49you know, where it hijacks, you know, Comet or hijacks, 10:52you know, Atlas. Right. There's also less transparency and control 10:56that we have as we go further into the future. 11:00AI can make mistakes, as Martin was mentioning before, or 11:05at least it seemed like a cognitive mistake, but it 11:08actually went to the uk, found a book rather than 11:11trying to find a book, maybe where we currently live. 11:15So it's like that information graph that it didn't associate 11:18correctly to the user. But in essence that's where I 11:23think it's going. And it's going to be fascinating to 11:26watch the field as it sort of begins to change 11:28and turn and it's going to change very quickly, I 11:31guess. Abraham, any thoughts on ultimately where OpenAI goes with 11:35all this? I mean, Aaron kind of name checked, sort 11:37of the idea of like, well, ultimately their ambition is 11:40world domination. Ultimately the ambition is not just a browser. 11:44They want to create a thing that can use any 11:47app on your computer and it starts to look like 11:51something which is maybe more akin to like what we 11:54identify with, like a lower level operating system. Is that 11:57where they're headed with this. Ultimately they've already added features 12:01in which they can start to plug into apps on 12:03your desktop laptop and as part of just the ChatGPT 12:07feature. So in terms of plugging into your apps, I 12:11think that they've already gone down that route. When I 12:16think about how people use the Internet today, when I 12:19say people, I don't mean researchers or individuals that maybe 12:23a little bit more acutely aware of generative AI. I'm 12:27talking about your everyday user. They see it more as 12:31a fix all tool. So they don't have the same, 12:33in my opinion, you know, guardrails or you know, specific 12:39issues with some of the security around using it. They 12:43see it as kind of my generation would have saw 12:45Google where this is, you know, this is a truth 12:47search engine. Whatever comes up is typically going to be 12:50is real. Yeah, yeah, exactly. So for the average user, 12:54I think having something like Atlas1 simplifies their Internet experience. 13:03would gladly take it. To be honest, from OpenAI's perspective, 13:07obviously monetization is, you know, a big aspect of their 13:10business. So I think this opens up a huge world 13:12for them in terms of being able to monetize it, 13:14whether it's via ads or, you know, what have you. 13:18But yeah, I personally, and I may be biased here, 13:21but I think this was a really smart move by 13:23them. I think it was. I think everything that's happening 13:26in the search industry right now I think is only 13:29going to benefit them. In terms of people adopting Atlas, 13:32I think they've done a great job of gaining mind 13:33share and gaining a market before throwing this out where 13:37it's a really easy switch. And to Martin's story where 13:41he could have done it in GPT, but why not 13:46just do it on the browser where you have all 13:48the context right in front of you. You can ask 13:50whatever question you want, have the memory cache. Yeah, I 13:53think it just makes perfect sense to be honest. Yeah. 13:55Martin, it looks like you might want to jump in 13:56or. One thing I will say though is as soon 13:59as you launch that browser, of course now the decision 14:01is do you want to switch over to another browser? 14:04And it is not shy on asking. Within about two 14:06minutes it was like, can I be your default browser 14:09now I haven't even put in two search queries. We're 14:12just getting to know each other. It also asked for 14:14Bluetooth which was, I was like, why do you need 14:17Bluetooth? My device? You're like for reasons. Exactly for reasons. 14:27Well, this is a Nice segue to the next topic 14:29I wanted to cover. So, Andrej Karpathy, who we've talked 14:32about before on this show, famously OpenAI co founder, influencer 14:37in the generative AI movement, was on a very prominent 14:41AI podcast, the Dorkash podcast, fairly recently, and he had 14:45this kind of much discussed set of comments that he 14:48made there, which I'll just kind of quote here. He's 14:51talking about agents. So he said, quote, they just don't 14:53work, they don't have enough intelligence, they're not multimodal enough, 15:01decade to work through all those issues. And I think 15:05maybe this is actually a really nice thing to build 15:06off of Abraham, what you just said, which is, is 15:10that going to be a barrier to agent adoption? I 15:13think, like, Andres definitely is kind of like looking at 15:15this from the perspective of a researcher who's aware of 15:18the technological limitations of what's being built. But it sure 15:22seems like people have enough confidence in these systems that 15:24they're more than willing to adopt agents, use agents, even 15:30in the presence of these kinds of problems. And so 15:32I guess maybe, Martin, to throw it to you, how 15:34big of a deal do you think are the issues 15:36that Karpathy is kind of pointing out here? Should the 15:40space be worried that it's not going to be as 15:42advanced as we thought? As quickly as we thought? Yeah, 15:45I think when somebody who's worked in prominent positions in 15:48two frontier AI labs, Tesla and then at OpenAI, comes 15:53out and says, agents are terrible, they're oversold and are 15:5610 years away from being useful, some years are going 15:59to prick up to that sort of thing. It was 16:03very interesting to hear him sort of give some of 16:05the reasons why he thinks that that is the case. 16:07I mean, in my experience with this book buying Agent, 16:10it already made a mistake that a human probably wouldn't 16:13do. If I asked Aaron, like, hey, Aaron, my personal 16:18assistant, could you go out and find me that Michael 16:20Connolly book? He's probably not going to come back with 16:21the UK edition. That would be part of the processing. 16:27So, yeah, you could see that. But he mentioned a 16:29couple of other things that can be sort of the 16:31cause of this, why agents sometimes just don't do what 16:34would seem like the intuitive thing from the human perspective. 16:38And one of the things he mentioned was training data. 16:41And he said that if you took the training data 16:44set of any large language model and you just picked 16:46out a random single document from that Training data set. 16:50He said, chances are that is either going to be 16:54just irrelevant, like it will be a stock ticker figure 17:06on average, most of the content that it's kind of 17:08scraped off the Internet is just kind of nonsensical, or 17:10it's full of errors, but if you have enough of 17:14it, then you can see the signal for the noise. 17:16So the training data could be a big part of 17:20that. And the second sort of controversial reason he gave 17:24were his opinions on reinforcement learning, where he also declared 17:28reinforcement learning as, well, pretty bad. He gave the example 17:34of a math problem that the reinforcement learning works by 17:37rewarding answers that are correct and punishing answers that are 17:40not correct, but not necessarily caring too much about how 17:43they got there. So did you do the right calculations, 17:46or did you kind of stumble on it by accident, 17:48or did you add in a whole lot of extra 17:50steps that you really didn't need to do? And I 17:52think those limitations of reinforcement learning appear quite prominently in 17:58the agent's chain of thought. So when you do set 18:00an agent out to do something and you see it 18:03processing its chain of thought as it trying to work 18:05through steps, it will often get stuck in these loops 18:08where it's doing things where you think, okay, let's just 18:10move on past that, get to the next thing. And 18:15I would suspect that reinforcement learning is a large part 18:18of that, that it's not always finding the most optimal 18:21ways to do things. But, yeah, I think when somebody 18:23like that comes out and says agents are currently being 18:26oversold, it is going to affect the industry. People are 18:29going to listen to that. Yeah, definitely. Yeah. The downstream 18:32effect of this is going to be big because obviously 18:35there's been so much excitement about, say, what agents can 18:38do and the promise of it. And, you know, I 18:41think there have been kind of rumblings from certainly the 18:44business space. Right. I think a couple of banks have 18:46come out, and there's this report from, I feel a 18:48few months ago that was like, oh, a lot of 18:50these pilots are not quite working out, but it seems 18:52to be like the first case of a really kind 18:53of like, strong technical, influential technical voice being like, guys, 18:58this is. This current research plan is not going to 19:00work. Aaron, do you buy it? Should we really be 19:04tapping the brakes on our optimism around this stuff? I 19:08think history always repeats itself. Right. We just need to 19:10learn from history so it doesn't. The bad parts of 19:12history don't A. Surprisingly hard thing to do. That's right. 19:18If I look back in the 90s and even 80s, 19:21we have these knowledge based systems and there's a lot 19:24of promise around them and it sort of, we entered 19:29in, you know, into a winter of AI, right. And, 19:32and a lot of that was fueled by the early 19:34neural networks. It couldn't even solve the Xor problem, you 19:37know, and so therefore we had to go to multilayer 19:39perceptrons, you know, to help solve that. But then we 19:42didn't have the computational ability. Right. So there's always stumbling 19:45blocks and problems that have to be solved by science 19:47and engineering. Right. And this is no different. I think 19:51what he's doing is very smart. Right. And I think 19:54that we need to reel in, you know, lots of 20:04and they may not even know how to spell AI. 20:06Right. And so we need to be careful about that. 20:09And I think Andre, he's taken a long call position, 20:12which I think is what a lot of us should 20:14do. Right. And he's trying to mitigate some of this 20:18overhype. It's very risky because we've seen it time and 20:22time again where, you know, a system doesn't live up 20:25to its, to its hype, but it does do what 20:27it's built to do, you know, and, and so by 20:31reining in and creating those benchmarks and guideposts, you know, 20:35it really, really helps us out and, and AI agents, 20:38we are in the early stages, you know, it's, and 20:40it's very exciting, you know, it's, it's fun to play 20:42with. But I will say when I'm building a production 20:45system, whether it's for sports or for, for entertainment, I 20:48always have a human in the loop, right. To make 20:51sure that what I'm producing is consumer ready. Right. We 20:55go to scale, right? A 1 to 2% error rate, 20:58that's huge. I mean that's 1 out of 100 requests. 21:02If I'm getting billions of our requests, a lot of 21:05people are seeing incorrect problems. And that's not even to 21:09say that these systems are going out to use external 21:11tools with say for example, ncp to activate something outside 21:15of the ecosystem, which we need to be very careful. 21:18There's a lot of non determinism around these systems. And 21:23I'm studying, actually looking at when should we use know, 21:27machine learning versus generative AI, because there's a place for 21:31both and when should you combine them together to get 21:33the best of both worlds. But, but yeah, I, you 21:35know, I do think that he's taken a long haul 21:38position and, and it's a very smart thing to do. 21:40Abraham, I'll give you the last word on this topic. 21:43One of my favorite images from the moment that we're 21:47in an AI, we talked about it, I think on 21:49a previous episode was the chart of like how the 21:51money is flowing in AI and it's just like, you 21:54know, Nvidia gives money to OpenAI. OpenAI gives money to 21:58NVID. It's just like it's a circle basically is where 22:01the money's flowing. Is, is this going to, is Andre's 22:05comments going to pop the bubble? Is there a bubble? 22:08I don't know what your views on this are. I 22:10don't know if the news is going to pop the 22:11bubble. Whether there is a bubble or not. I think, 22:13you know, I'll let everybody decide for themselves. I definitely 22:17think there's overhype, that's without question. And I think there's 22:21overhype for a specific reason that, you know, probably predicated 22:24on a financial reason. But Aaron said something that kind 22:30of really resonated like the, the, the current patterns for 22:32agents are non deterministic. You know, whether it's, you know, 22:35a planner or you know, just taking the models many 22:38times for inference scaling or, you know, they just, they 22:41don't offer the outputs, the guardrails around outputs in terms 22:46of I need a specific output every single time and 22:48if it's not operating within this scope, then, you know, 22:50go back. So I think from a agent's perspective, like 22:54I think there's, there's good enough where you know, you're 22:56doing a search function and the stakes are low. If 22:59you don't get it, you know, you can redo it. 23:00And then there's production grade agents where the stakes are 23:04high enough, where if it's not, you know, 99 plus 23:07percent, it's, you know, we can't move it to production. 23:10So I think there's different worlds in terms of, you 23:12know, whether agents are going to make it or not. 23:15Personally, I think there's a, a need for more of 23:20a deterministic outcome for these agents. I think software is 23:25going to be kind of that. So generative computing specifically 23:28is kind of going to be that key piece in 23:29terms of making sure that agents are production grade. Whether 23:33that's through policy management, whether that's through requirements or IVR 23:37patterns or what have you, I don't know. If it's 23:4010 years or not. But I mean, I definitely do 23:42agree with the statement in terms of the overhype around 23:44agents, but I also think that there's still a place 23:47for them today. It's just a matter of being able 23:49to define where. Yeah, your use case. Yeah, it makes 23:52a lot of sense. I'm going to move us on 23:57to our next topic, the next two segments. We're going 23:59to talk a little bit about sort of interesting papers 24:01that have kind of come across our radar in the 24:04last few weeks. And I guess, Martin, I'm going to 24:07pick on you. A few weeks ago I picked on 24:09Chris Hay. I was like, could you explain manifolds and 24:11exactly how they work in the context of machine learning? 24:14I'm not going to do anything so mean to you 24:17today, but a super interesting paper out of Deep SEQ 24:20called Deep SEQ ocr. The paper is Deep SEQ OCR 24:24Context Optical Compression. And I guess, Martin, the first question 24:27to just toss to you is that the paper is 24:28trying to deal with the problem of models having trouble 24:33dealing with long contexts. And can you tell us a 24:37little bit about that problem? Why does that happen? What 24:40are the kind of practical implications of that? If we 24:42look sort of the trend at large language models now, 24:45we're seeing bigger and bigger context windows to fit more 24:47and more stuff in. So the more and more stuff 24:49that it can keep in mind is going to be 24:52prioritized when it comes up with a response. So how 24:55do you get as much information as you can in 24:58a certain context window, given how computationally expensive it is 25:02to expand the context window? So, yeah, this was kind 25:04of an interesting idea of actually basically turning these tokens 25:08into visual tokens and you could actually cram a lot 25:12more information into that, but depending upon the algorithm that 25:16you used, there was a certain loss when you convert 25:20it back again into text, but quite a small loss 25:24for some. So I think the best model there was 25:27something like a 97% rate of being able to take 25:32text, basically do this conversion, convert it back again, so 25:36the decode, encode cycle, and then 97% of the text 25:40was about right. Not too bad. Not too bad, right. 25:43But you could have, if you have an image with 25:46basically less information in it that that can still go 25:49through this encoder decoder loop and still bring back text, 25:53but the text has lost a little bit more information. 25:57And what's really interesting is that they wrapper this around 26:01this idea of a forgetting mechanism which kind of mimics 26:06human memory much more. So the big thing in the 26:10paper is how similar this mimics human memory. So, for 26:14example, if you run this through the best model, it 26:19basically remembers almost everything. Just like I remember on this 26:24podcast the question you just asked me now, Tim. But 26:26I was on this podcast a month ago and I 26:30remember who the guests were and what the topics were, 26:33but my memory is not fully there now. I don't 26:36remember the specific questions that you asked me or the 26:38specific talking points that the other guests made. So it's 26:41a bit more fuzzy. Well, they kind of, they, they 26:44model that in this paper and they say, well, actually 26:47that sounds like the base model or the small model 26:52that we have, because the small model that is considered 26:55a more blurry image and will be about the equivalent 26:58of one month's of human memory. So something that happened 27:02to me one month ago, if you use their blurry 27:04model, then that will create an image that will be 27:07about the same. So about the same amount of stuff 27:09will be forgotten there. So it sort of brings up 27:13an interesting point is, is there actually some utility in 27:18that, in that having large language model context windows mimic 27:22human memory a bit more? Is there a reason that 27:24we evolved to remember things in the present very well 27:28and then just to remember things more in abstract as 27:31time goes on? And the fact that this could model 27:33that as well? Yeah. And the answer there being like 27:36in the biology case, that essentially it's just in the 27:39same way that it's computationally intensive for machines, it's also 27:43kind of computationally intensive for us to have increasingly large 27:47context windows, in effect. So one practical import of this 27:50paper seems to be, look, we could feed in language 27:54tokens or we can feed in picture tokens, I guess 27:57to make it a very simplistic kind of distinction. The 28:01end result is that we can, we can do a 28:02lot more compression, right. Which I guess gets us to 28:06longer and longer contexts. Is that kind of where this 28:08is all headed is like you can start to put 28:10even more in the window if this is actually something 28:13that becomes more production ready. I think about this as 28:15a document distillation, much like model distilling. You have big 28:19model make a smaller model. Here we have a document, 28:22but we want to distill it down into principal components. 28:26And it's in this similar, from a math battle perspective, 28:30perspective of principal component analysis, where you want to keep 28:33the largest eigenvectors so you have the most variance that 28:36can explain your data. This to me seems a bit 28:40similar, except we're using these different vision encoders. So they 28:44have a two stage system would have a vision encoder 28:47called Deep Encoder. And it's pretty cool because they take 28:50in a PDF file but a scanned piece and it's 28:56not just ocr, so it's not just extracting text, but 28:59it looks like it's liberating, right, the text, we can 29:02actually see what it is. And it turns this messy 29:05human written world into something that AI can understand. And 29:10it helps to create these smaller tokens so we can 29:14have this new way of document understanding. And it helps 29:17with the core problem that LLMs struggle with these long 29:20contexts due to the quadratic scaling, that the longer the 29:23context comes, it becomes harder and harder for these systems 29:27to process it. And therefore we leverage this efficient compression 29:32technique so that we can understand this information into a 29:37textual representation. And what I think is neat in this 29:40two stage system, when you get down to the decoder, 29:43what if you could just take that vision encoder and 29:46you've put it into a new language, then you could 29:49train a new decoder so it translates it into any 29:53other kind of language, such that it can be used 29:56by another, you know, large language model, any sort of 30:02multimodal model. And it just produces something very different, you 30:06know. And so I think it'll become more of like 30:09an art form, you know, where you're putting together these 30:12different layers of encodings and decoders and finding out what 30:16best works. In fact, you could think of it like 30:19a search problem, you know, find the best set of 30:21models of decoders, encoders to solve a certain problem in 30:25the most efficient way. But yeah, I'm excited. I liked 30:30it, what they showed here, looking forward towards their next 30:36paper where I think they're going to have an expanded 30:40piece where they're going to add some more experiments to 30:43see how it works with multimodal and text and fused 30:47together and so on. Abraham, maybe a final question for 30:50you is just to zoom out a little bit. Deepseek 30:54obviously kind of got on the map in a big 30:57way through the release of its kind of open source 30:59models. It is a lab that's doing research and is 31:03publishing papers. Do you have any speculation on why it 31:07is Deepseek is interested in these kinds of research questions? 31:10OCR is one of those really old problems, so it's 31:14a great question. Personally, I think this may be just 31:19one of those innovations that they found in the lab 31:21that demonstrated a step forward that was worth releasing. Also, 31:26I think that the idea of having this infinite or 31:34longer context window is really applicable to particular reg use 31:38cases that are really important and something that hasn't been 31:42solved across the board. So I think both those answers 31:45help them. One, make sure that their name is still 31:49in the news. But two, demonstrate that the research lab 31:51is still doing some pretty cool things. In terms of 31:54this particular paper, what I thought was really neat was 31:57obviously shifting from a different approach in terms of how 32:04you actually encode and embed a document. What I would 32:09love to have seen personally is the semantic representation still 32:13kind of kept in terms of moving from an image 32:18to text. And what does that actually mean for downstream 32:21applications? Because that's where you're really going to see a 32:24lot of the implementation here. Or is this really just 32:26an OCR text extraction where there's some really quick plug, 32:31IBM where Docling does this extremely well, extremely efficient, where 32:36you don't need an LLM to do that, to be 32:38honest. So it's a little bit of overkill in that 32:40regard. But yeah, I'm excited to see the next version 32:44of this paper and the next version of this effort 32:46from Deepseek. Yeah, it doesn't seem like a very flashy 32:49milestone, but it is definitely a critical piece of the 32:52AI stack. If you take the OCR part out of 32:54this, it's a compression bridge to help help other models 32:58to handle large scale problems with a very small number 33:02of tokens. Right. So it's pretty neat. You know what 33:05they're doing here. It would be really cool. I think 33:07Aaron already kind of commented on this, but in terms 33:10of the output, like the most models are text token 33:14oriented. So it would be really cool if they can 33:18release some type of plugin that you can swap out 33:20different decoder models in place of their decoder model. So 33:24this is a little bit more from an adoption standpoint. 33:26You can kind of use you from an LLM standpoint, 33:30what makes sense for your environment. Do you think the 33:32next stage of this is that we're going to be 33:33seeing kind of. We already have AI art, right? Are 33:36we going to see AI art of context windows like 33:38Aaron, are you going to have that picture behind you? 33:40Is that going to show some kind of visualization of 33:43your context window that we'll all be able to pick 33:45up. On now that would be a scary proposition. You 33:47don't want to see my context window. You do not 33:52want to see it. You know, the whole notion of 33:54like effective computing, you know, where these systems can understand 33:58how you're feeling, what you're thinking, I think that this 34:04might play into some of it because it's like another 34:08bridge into understanding language from different areas. And that bridge 34:12could be between modalities or between people and models, in 34:18a sense. And it creates this language such that we 34:21can have other interpretations and other agents, you know, go 34:27ahead and run and change that image maybe behind me. 34:33All right, I'm going to move us on to our 34:35final topic of the day. So last paper we talked 34:38about Deepseek, OCR a little bit kind of in the 34:40guts of the system. This other paper was just fun. 34:43It got talked around, talked about a lot online, and 34:45I figured we'd kind of bring it up here. So 34:47the name of the title is very striking. It just 34:49says, LLMs can get brain rot, exclamation point. And the 34:54intuition of the paper is kind of a fun idea. 34:57It basically says, look, there's a lot of hand wringing 35:00and concern that if we consume lots of junk media 35:04on social media, that we will, as humans literally get 35:08brain rot. Right? Like that we will think less. Well, 35:11we will do reasoning poorly, have all these cognitive defects 35:15from exposure to this content. And so the researchers just 35:18simply say, like, well, what if we can LLMs get 35:22brain rot too? And so what they did is they 35:24kind of curated a couple of data sets of social 35:28media content that they considered short and popular or sensationalist. 35:33And they said, well, we can do kind of like 35:35a little post training mix to the model where we 35:37say we're going to slowly increase the amount of junk 35:40web text is what they say, how they framed it 35:43up, up to the model, and then we see how 35:45it performs against certain benchmarks. And what they claim is 35:48that these models do experience a form of cognitive decline. 35:52And so they say that there's declines in reasoning, long 35:55context, understanding, safety, and then also even claim that there's 35:58the emergence of these dark traits. Right? These models become 36:02more narcissistic because they've seen this content. So I guess, 36:08Aaron, maybe I'll throw it to you. What does this 36:12paper show? Does it show that, like, if we consume 36:14lots of bad material online, then our brains are literally 36:17going to rot? Or what is this? Yeah, what do 36:19we take from this paper? I mean, you know, the 36:22headline for me here was garbage in, garbage out. But 36:24I think the big flashing star was that the decline 36:28of these LLMs, it was persistent and systematic. Right. It 36:31wasn't like, you know, you could just quickly fix it. 36:33Right. And, you know, the risk is that, you know, 36:36as we put these systems in the wild and more 36:39training data, you know, become shallower and Shallower, you know, 36:41for example, then we need to continually evaluate these models 36:44because this brain rot can happen. And then I sort 36:48of ask myself, why does this happen? And I was 36:51thinking in terms of momentum and inertia. So during these 36:56back propagation, when you're training, I was thinking maybe there's 37:01this extra momentum and the gradients start becoming much more 37:08marginal as you learn over time. But then once you 37:12stop training, then it's like you have this inertia where 37:16you can't get or unlearn fast enough. And it's like 37:21humans, where we have kids and child, their brains are 37:25very plastic, they can learn very quickly, they can change 37:28very quickly. But then as adults, you get older and 37:31older, you have such large amounts of knowledge and embed 37:36it, which is fantastic. But then on the other hand, 37:39right, because our brains are already wired very densely as 37:43opposed to kids, it seems as though these LLMs are 37:46becoming wired much more densely so quickly, where they're sort 37:50of moving out of their childhood in essence. And it's 37:54harder to change that systematic perspective. And there might be 37:59other techniques that are needed, such as you could do 38:02some virtual lesioning, right? You could do some neural damaging 38:06or find what the superweights are within the model, remove 38:09them, and then train to sort of remove this rot 38:14that is happening. But it's definitely a, you know, a 38:18real problem here. And it does parallel, you know, human 38:21cognition, you know, you know that we humans need to 38:24be careful, you know, what we learn and what we 38:26really focus our attention on as we're out, out in 38:29the wild too. So I guess, Martin, are you. You're 38:32not really surprised by this result, right? I guess insofar 38:34as you think that this is like Aaron's interpretation is, 38:37yeah, obviously you feed in some bad content model behavior 38:40will get worse because it's just mirroring that. So I 38:43guess how shocked should we be about these results? Is 38:46there anything surprising here? I suppose I feel like this 38:48is catnip for every parent of a teenager who can 38:51say, look what happened to that chat model when we 38:54gave it all this junk. The thing that I found 38:57most surprising, so you mentioned, Tim, that they kind of 39:00categorized this BrainRock content into really two types of engagement 39:04and semantic quality. So engagement were just kind of short 39:08pieces of information, like a tweet or something. It's just 39:10a sentence or two. It's giving some kind of factual 39:12information, but really briefly, no room for nuance. It's just 39:16like, here's this fact. And then semantic quality was the 39:20sensational Stuff that, you know, wow, look at this. That 39:22sort of thing. And there was actually a significant difference 39:26in when the model was fed M1 versus M2 junk 39:31data, the engagement versus the semantic stuff, as to the 39:34outcome in terms of personality and the M1 data. The 39:39engagement stuff, the things like tweets, affected personality a lot 39:43more than just the sensational stuff. It seems that that 39:45didn't really affect personality much at all. But the, the 39:49engagement stuff, the short tweet stuff that increased when it 39:54was pushed to 100%. So all the model received was 39:56just a bunch of tweets. It increased narcissism, it made 40:01the model less agreeable, and it made the model more 40:05extroverted. And I'm like, that sounds an awful lot like 40:09an outspoken TV pundit or something like that. So if 40:13I was training to be a talking head, like a 40:16shock jock kind of thing, this is where I need 40:18to get my training from. I need to just be 40:20looking at short form stuff. And I'll get those traits 40:22too. I see. Just to add to that, the M1, 40:25when trained on the M1 data, it also, you saw 40:28a sharp increase in abrupt stop in thinking. So it 40:32didn't go through the full thinking process or would cut 40:35it short or not do it at all. So it's 40:37just one of those weird things where it's to your 40:40point in terms of the personality of the individual who 40:42typically wants to shock enough versus actually have a confidence. 40:47Well, and I think that's kind of the interesting thing 40:48here. I mean, that's actually, that's like one thing I 40:50do want to get to on this paper is like, 40:52how much there are these like interesting kind of confounding 40:54variables here. Right. Because it like, it may not be 40:58that it's short content. Right. But it may just be 41:01that it's short content drawn from Twitter Now, X. Right. 41:06And so I wonder if like the. The presence of 41:08narcissistic, you know, adversarial traits comes less of the fact 41:13that the content is short form and more because of 41:16the culture of the place where the data is being 41:18drawn. Now, I guess, Abraham, the question for you is, 41:20I guess maybe in the case of reasoning, because it 41:23is short, it actually limits how much reasoning it can 41:26do. And so maybe that's actually like there's these really 41:28interesting kind of effects here. Some of them related to 41:30it being short, some of it being related to, I 41:33think like the source of it. Right. Certainly my feeling 41:37about X is that it's a platform where you feel 41:39that there is A lot of aggressive antisocial behavior. No, 41:42that's fair. You only get 156 characters or 256 characters. 41:47So yeah, there's not going to be a lot of 41:48thinking kind of allowed. What I think this paper really 41:53shows is obviously garbage in, garbage out, but there's always 42:04much more important than quantity in terms of actually getting 42:08performance on your model. One thing that I thought was 42:10really interesting was the outline that the quality of the 42:16data degraded as it was newer. So when they went 42:21through the actual corpus of training data, the more recent 42:25the data was, the lower quality it was, which was 42:27just very interesting to see that. Does that mean that 42:31our quality of content is just gradually with getting to 42:35the lowest common denominator or is this the product of 42:38just lazy literature? But it was. Yeah, the paper was 42:42interesting. I think there's a lot of parallels or philosophical 42:45questions you can ask yourself based on this paper and 42:47but I'll leave that to other people. I've always thought 42:49that it's always best to, you know, train these large 42:52models on factually correct, dense, deep data. Right. And that's 42:57your foundation model. Right. And then if you want to 42:59change the personality, the tone, the pitch, prosody of speech 43:02even, that's where you use context engineering and then you 43:05add in the traits of which you want it to 43:07behave. But it's not necessarily always the best, you know, 43:10to try to train the traits that you want it 43:13to act like with an embedded into the knowledge structure 43:17because then you're sort of watering down and it's kind 43:19of like brain loss amnesia. Right. Because, because the models 43:23are really, they're forgetting about what they know and it's 43:28more about how should I act. Right. And so therefore 43:30it becomes very watered down. And so because of that, 43:34the decline of its behavior or the way it reasons 43:37it's not easily fixed by later instruction, tuning or even 43:40cleansing the data and it couldn't recover the baseline capability 43:44because of that. And so therefore I do think that 43:48this is a lesson learned that when you do training 43:51and you fine tune, be really, really careful about the 43:55kind of data that you do and make sure that 43:57your mark marching towards the objectives that you want to 44:01have with the model. Yeah. And I think if we 44:04then extrapolate that to us humans, if there is a 44:07parallel here, maybe we should be spending less time on 44:09Twitter and more time consuming high quality long form content 44:13like the mixture of Experts podcast. Yes, exactly. We highly 44:17recommend this. Yeah. Well, that's a great note to end 44:21on. Martin, Aaron Abraham, always great to have you on 44:24the show and hope to see you guys soon. And 44:27that's all the time that we have for today. Thanks 44:29for joining us, listeners. If you enjoyed what you heard, 44:30you can get us on Apple podcasts, Spotify and podcast 44:33platforms everywhere. And we'll see you next week on Mixture 44:36of Experts.