Learning Library

← Back to Library

Mixture of Experts: AI News & Breakthroughs

Key Points

  • The host touts a new image‑generation model as far ahead of competitors, beating benchmark scores by roughly 200 points and marking it as the most impressive system they’ve seen.
  • This week’s “Mixture of Experts” episode brings back IBM fellow Aaron Botman and engineer Chris Hay, and introduces newcomer Lauren McHugh, while previewing topics such as OpenAI’s potential infrastructure sales, a “nano‑banana” reference, the US Open, and KPMG’s 100‑page AI prompts.
  • In the news roundup, NVIDIA posted a 56% jump in data‑center sales year‑over‑year, yet missed analyst revenue expectations, prompting a mixed market reaction.
  • OpenAI and Anthropic announced a joint effort to probe model security—especially hallucinations—while 911 centers are trialling AI to handle low‑priority calls, and IBM with NASA released the open‑source “Suria” foundation model for forecasting solar storms.

Sections

Full Transcript

# Mixture of Experts: AI News & Breakthroughs **Source:** [https://www.youtube.com/watch?v=zw0Ymg_DoEs](https://www.youtube.com/watch?v=zw0Ymg_DoEs) **Duration:** 00:43:53 ## Summary - The host touts a new image‑generation model as far ahead of competitors, beating benchmark scores by roughly 200 points and marking it as the most impressive system they’ve seen. - This week’s “Mixture of Experts” episode brings back IBM fellow Aaron Botman and engineer Chris Hay, and introduces newcomer Lauren McHugh, while previewing topics such as OpenAI’s potential infrastructure sales, a “nano‑banana” reference, the US Open, and KPMG’s 100‑page AI prompts. - In the news roundup, NVIDIA posted a 56% jump in data‑center sales year‑over‑year, yet missed analyst revenue expectations, prompting a mixed market reaction. - OpenAI and Anthropic announced a joint effort to probe model security—especially hallucinations—while 911 centers are trialling AI to handle low‑priority calls, and IBM with NASA released the open‑source “Suria” foundation model for forecasting solar storms. ## Sections - [00:00:00](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=0s) **Mixture of Experts AI Panel** - Host Tim Huang opens the weekly Mixture of Experts podcast, lauds a breakthrough image‑generation model, and introduces veteran guests alongside newcomer Lauren McHugh to discuss a range of AI topics—from OpenAI’s potential infrastructure sales and KPMG’s 100‑page prompts to Nvidia’s market‑cap dominance and other recent headlines. - [00:03:05](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=185s) **KPMG's Taxbot Uses 100-Page Prompt** - The hosts highlight KPMG's Taxbot, an AI tax advisory tool that relies on an unusually massive 100‑page prompt, sparking talk about the scale of prompts in modern AI applications. - [00:06:34](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=394s) **Fine‑Tuning vs Long Prompts in Tax Bots** - The speakers debate whether the need for extensive prompts in a tax‑domain agent arises from model design and fine‑tuning considerations or from custom, unscalable solutions, questioning how much of the source material must be rewritten for new use cases. - [00:09:56](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=596s) **Challenging the “Agent” Terminology** - The speaker argues that labeling a large prompt‑based system as an “agent” is misleading, insisting that a true agent should dynamically retrieve and assemble information—especially for real‑time data—rather than relying on an unwieldy hundred‑page prompt. - [00:13:12](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=792s) **Feedback Loops and Prompt Engineering** - The speakers debate how the satisfying feedback loop of advanced prompting creates alignment challenges, citing accountant use‑cases, debiasing, sentient‑personality risks, and corporate prompt‑engineering teams like KPMG’s AIML group. - [00:16:21](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=981s) **OpenAI May Launch Its Own Cloud** - The hosts note AI’s ability to replace traditional summaries, then explain the OpenAI CFO’s offhand remark that the company could soon sell its own compute infrastructure rather than relying on services like Google Cloud or AWS. - [00:19:33](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1173s) **Accelerating GPU Obsolescence and Leasing** - The speakers discuss how rapidly cutting‑edge GPUs become outdated, prompting firms to rent newer hardware for training, sell or lease older units, and follow AWS’s model as the industry shifts toward inference workloads. - [00:24:22](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1462s) **Betting on Proprietary AI Infrastructure** - The speakers speculate that building a competitive inference stack and dedicated data‑center infrastructure serves as a hedge against open‑weight models, raising questions about the massive financing required and its market impact. - [00:27:43](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1663s) **AI Image Demo Sparks Industry Shift** - A playful AI‑generated visual demo transitions into a broader commentary on how advanced style‑transfer models will upheave traditional image‑editing tools such as Photoshop and Canva. - [00:31:56](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1916s) **Navigating Trust in AI Images** - The speakers discuss how widespread AI image models have become, noting that increased exposure has sharpened public skepticism, and they stress the importance of holding creators accountable and ensuring safety to mitigate misinformation. - [00:35:18](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2118s) **US Open Digital Fan Innovations** - The host and guest review three new technology-driven features—including a real‑time match chat agent and a “Key Points” TL;DR summary—designed to enhance the experience for millions of on‑site and online US Open fans. - [00:38:31](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2311s) **Alcaraz Match Analytics Demo** - The speaker demonstrates the IBM Slam Tracker’s capabilities—showing scores, 360° storytelling, predictive win probability, and live likelihood visualizations—using a recent Alcaraz tennis match. - [00:41:51](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2511s) **Live Win Probability Demonstration** - The speaker walks through a real-time tennis match illustration, showing how the live likelihood-to-win metric tracks shifting odds and momentum, while also highlighting upcoming fantasy football player insight tools. ## Full Transcript
0:01I think this is way more than a toy. This 0:03is by far the best image generation model that I've 0:08seen today. And even if we look at the benchmarks, 0:11you know, when we look at and I'm not a 0:13big fan of benchmarks as you know, but even when 0:16you look at those benchmarks, it is 200 sort of 0:19hello points ahead of everything else. All that and more 0:22on today's Mixture of Experts. I'm Tim Huang and welcome 0:31to Mixture of Experts. Each week MOE brings together a 0:34panel of people pushing the frontiers of technology to discuss, 0:37debate and analyze their way through the wildly fast paced 0:40world of artificial intelligence. Today I'm joined by a great 0:43crew of veterans and also someone joining for the very 0:46first time. I've got Aaron Botman, IBM fellow master inventor. 0:49Aaron, welcome back to the show. Chris Hay, distinguished engineer 0:53and longtime MOE veteran. And joining us for the very 0:56first time is Lauren McHugh, Program Director, AI Open Innovation. 1:00Lauren, welcome to the show. Thank you. So we've got 1:03a packed episode today. We're going to talk about OpenAI 1:05hinting that they might sell infrastructure, nano, banana, the US 1:08Open. We're going to even talk about 100 page prompts 1:12coming out of KPMG. But first, as always, we've got 1:15our new segment from Ili. So Ili over to you. 1:23Hey everyone, I'm Ili McConnen, I'm a tech news writer 1:26with IBM. Think I'm here now with a few AI 1:29headlines you may have missed this busy week. First up, 1:32Nvidia, the world's most valuable company by market cap, reported 1:37a whopping 56% increase in sales over the same period 1:40from last year. And this was largely driven by its 1:44data center business. This would seem like good news for 1:47the chip maker, right? In fact, market reaction was mixed 1:51because the revenues did not meet analysts expectations. Next up, 1:56OpenAI and Anthropic, two of the biggest rivals in artificial 1:59intelligence have actually teamed up to better understand the security 2:03issues facing models. They recently evaluated each other's models in 2:08order to better understand hallucinations and other issues, basically hoping 2:13to catch what their own tests had missed. Meanwhile, in 2:17the category of hopeful AI, many 911 centers are so 2:22understaffed that they're turning to AI to help them out. 2:26This may seem problematic on first blush, but actually these 2:30AI agents are helping with parking violations, noise complaints, basically 2:35non urgent issues so that the human AI staffers can 2:39deal with the real emergencies. Last but not least, I, 2:43IBM and NASA are helping give scientists more time to 2:45prepare before big Storms hit. They recently released a new 2:50open source foundation model called Suria that can predict solar 3:02our Think newsletter. The link is in the show. Not. 3:09So Normally here at moe, we cover some of the 3:12biggest stories happening in AI technology. You know, the drops 3:16of all the largest models coming out of the frontier 3:18model companies, the biggest features and products that people are 3:21launching. But I actually want to start today with kind 3:23of a funny smaller story. There was an article that 3:27was written about kpmg, the kind of global accounting firm 3:31which kind of launched their own, as many companies and 3:34enterprise are doing now, their own AI agent, which they 3:37call Taxbot. And what Tax Bot is attempting to do 3:40is gather together all of the tax advice expertise across 3:43a big firm like KPMG and essentially strip through documents 3:47and generate sort of 25 page kind of advisory opinions 3:50for their customers that basically are like kind of the 3:53first draft of what they would typically provide for a 4:03this was the really funny thing. They took a lot 4:05of flack, I don't know if flack's the right word, 4:07but they got a lot of attention online because they 4:10sort of revealed that in order to power Taxbot, they 4:13had a hundred page prompt running behind this, which I 4:17think just as someone who, you know, kind of comes 4:20from a world where prompting is like a few sentences, 4:22this is like really remarkable. And so maybe Aaron, I'll 4:26start with you, is like what's the longest prompt you've 4:28ever written? And is it kind of surprising to see 4:31like 100 page prompts, like novel length novella life prompts 4:34coming out? Yeah. Well, first I gotta say, you know, 4:37growing up and this, this might, you know, give you 4:39my age a little bit, but I used to use 4:41these yellow books called Cliff Notes, you know, where, where 4:44I could go pick it up from like Barnes and 4:45Noble, right. Or, or even buy it from Amazon and, 4:48and get Cliff Notes about a book. Well, we certainly 4:50don't need those anymore, right, because we can use these 4:53large prompts, right. To summarize. But the largest prompt that 4:57I've ever written, I would say semi written, right. Because 5:01I just copy paste a manual right into the context. 5:03So it was probably about 40 pages, right. That I 5:06inputted into a model and, and it came out with 5:09key points that were summarized. So it was very effective 5:11and really interesting how it worked. And it surprised Me 5:15and I think. It'S kind of the interesting thing. And 5:18I guess maybe Lauren, do you want to jump in 5:19on this? I know a narrative that was very prominent 5:23maybe a year and a half ago, two years ago, 5:25a long time ago in AI time was like, prompt 5:28engineering is going to be dead. Over the long run, 5:30we're not going to really need prompt engineering. You're going 5:32to just tell the computer what you want and it 5:34will do it. This kind of story almost points us 5:37in a different direction, right? It's like almost a world 5:40in which in order to get agent behavior to be 5:43really, really good, there's going to have to be like 5:45a lot of specification. And in some sense prompt engineering 5:48is becoming a bigger part of what gets these things 5:51to work. Is that the right way of thinking about 5:53it? Did it turn out that prompt engineering was not 5:54actually dead? I think a good way to appreciate how 5:58complex a prompt would be would be to look at 6:00some of the open source projects that, that are essentially 6:03agents, so GPT researcher, meta GPT. You can see how 6:07long and complex those prompts are. And that's been, you 6:10know, a whole community's worth of contribution of ideas of 6:12how to make the agent work better. I do think 6:16that, you know, if a product requires 100 page user 6:21manual to work that, you know, at best it's poorly 6:25designed, at worst it's broken. And in this case the 6:30product is the model, the user manual is the prompt. 6:34So one thing that you could do is actually fine 6:37tune it. I think fine tuning probably is making a 6:40comeback, especially with some of the models like Gemma 3, 6:442, 70 million, where the actual architecture is made to 6:49be tuned. More parameters allocated to the embedding versus parameters 6:54allocated to the transformer blocks to do the processing. Part 6:57of it I think is like it actually goes to 6:59Aaron's original example, which is, I think what's prompting exactly. 7:03I think prompting sometimes can just mean the input to 7:06the model, in which case it's maybe no surprise that 7:08you put in a whole manual to try to get 7:10it. To summarize here, I'm curious if there's a reason 7:14why these prompts need to get super long in the 7:16tax domain. Is there something about agents that require us 7:20to have longer prompts or do you think this is 7:22just kind of a weird artifact of how they design 7:24the designs, this tax bot? Basically I think it like 7:28my main question would be out of those 100 pages, 7:31how many of those pages need to be rewritten for 7:34a new use case that, you know, that Only KPMG 7:37knows. And that would get to the heart of how 7:40much of this is, you know, because it's an agent 7:43and any agent would need to do that. And then 7:45how much of this is truly a custom solution which 7:48is then a lot harder to scale. Yeah, there's a 7:50really interesting dynamic there. And I guess maybe, Chris, I'll 7:53bring you in on this. I think, Lauren, what I 7:55hear you sort of saying is that in some ways 7:57you have these really long prompts to make up for 7:59all the knowledge that the model doesn't know. And so 8:03I guess, Chris, maybe there's one point of view is 8:04that as these models get deployed in more and more 8:06specialized domains, it's not going to be atypical to see 8:11really, really long prompts emerge. Right. Because in effect, there's 8:14all this domain knowledge that like a general model might 8:16not have. I suppose there's the original idea that the 8:19base model would just get smart enough that you wouldn't 8:21have to do that. But I mean, if this is 8:23a good example, we may not be headed in that 8:25direction. I'm not surprised though. I think 100 pages, I 8:31mean, as long as 99 of those pages aren't do 8:34not hallucinate. Do not hallucinate and repeat, then it's. Actually 8:38like the Shining. It's just the same sentence over and 8:40over and over again. Exactly. But to Lauren's point, right, 8:44if the model doesn't have the knowledge in the first 8:46place and you've got a lot of specialist domain, then 8:48you're going to have to put that in the context. 8:50And I'm not against it because we've all been using 8:54retrieval augmented generation for a while. And if you really 9:03into your context anyway. So in some regards, how is 9:08that any different, really, what you're saying is actually I 9:11can fit everything that I need into the context window 9:15and therefore the model's going to stand a better chance. 9:17And you know, and I, I have to admit, I 9:21would probably rather have it in the context window than, 9:26you know, sort of rolling the dice at RAG and 9:29hoping that it gets the right chunk coming back so 9:31it goes either way. But yeah, you are making up 9:35for lack of knowledge or there's certain patterns that you 9:38want it to do. I need it. If you're generating 9:40a 25 page document and that 25 page document's got 9:43to look in an exact way to Aaron's earlier point, 9:46you're building a specification and the model is not a 9:49mind reader. It's got to produce it in the way 9:51that you want. And a good prompt is going to 9:53have examples. This is section one, this is section two. 9:56I want you to do this. Do not talk about 10:04it. Fine tuning is really hard. So, you know, again, 10:09if you can stuff it in the context, that's fine. 10:11What I would probably say is I think this is, 10:15and I probably challenge their use of agent in this 10:18case. I suspect it's just a prompt, but I think 10:22if it was truly, truly agentic, I would argue that 10:26the agent would be able to go round and round 10:29the loop a few times. And I don't think you 10:31would need 100 page prompt in that sense. And you 10:36could have the agent pull the elements that it needs 10:38and then bring together that structure in a way. Now 10:41in reality it probably ends up still about a hundred 10:44pages, but I think it's rather than stuffing into the 10:47context window, you're having the agent go search and then 10:50bring everything together. So I challenge the word agent in 10:53this case, you know, but yeah, yeah. Yep, yeah, I 10:57wanted to just jump in and make two points if 10:59I could very quickly, you know, as to why you 11:02would want to use a hundred page prompt. So to 11:04me it seems as though these real time systems, right, 11:07because the data is updated in real time, it's never 11:09going to be within the knowledge base or the foundational 11:13model that you have. So if you think the stock 11:15market and you want to ask questions about what's happening 11:19today or this minute, then you need to get that 11:22information within a prompt. And you can get a lot 11:26of information very quickly within a prompt and then perhaps 11:29you even add in a Persona. Right. And then the 11:33second use case that I was thinking about is why 11:35you would want to use this kind of a prompt 11:37even if the data were in the foundational model is 11:41if you think of like a flashlight, when you put 11:44content within a prompt, you're telling the system focus in 11:48on this type of data rather than hoping and rolling 11:51the dice like Chris mentioned, that you're going to get 11:52that information back as a result. But I would pair 11:56that with like an Alura technique where you can determine 11:59what the attention mechanism needs to focus in on. Right? 12:03So if you pair the Alura technique with a large 12:06prompt, then I think in turn you'll be able to 12:09really take that flashlight when you're in the dark and 12:11light up exactly what you're looking for. And Aaron, how 12:14do you plan to explain that to an accountant to 12:16do as opposed to copy and pasting into a prompt? 12:19Yeah, so in accounting it's important I think because lots 12:23of these different types of rules and regulations, it changes 12:26really quickly. And so if they're trying like tax in 12:30this case, I think it's important to. I don't exactly 12:35know what was in the prompt. Right. In this 100 12:37page prompt, but I would hope you know that it 12:39was mostly about rules, regulations so that they could better 12:42understand and advise maybe somebody around what's happening. But in 12:48that tax, that's what I would think and sort of 12:52help out a tax auditor. That's one of the reasons 12:54I think prompting is kind of, it's unbeaten even though 12:57it's considered a little bit of a cheap way of 13:00doing things for people who are much more in the 13:02machine learning world is for an accountant, like they're not 13:05going to go through some fine tuning process. They would 13:09much rather just type stuff in and see things happen. 13:12And so it's really hard to beat the fact that 13:14the feedback loop I'm prompting is just so satisfying in 13:17a way that is really hard for other methods of 13:20AI alignment to basically work. Chris, do you want to 13:22jump in there? No, I agree with you 100% but 13:26I would love to see some accountants sitting there going 13:29why should I be using Qlora here? How am I 13:33going to debias my data set here or you're an 13:36X expert tax advisor. Please use Australian language in the 13:40response. Do not elucidate. Here's. Here are the tax codes. 13:44That's. Yeah, yeah. I think what's kind of scary too 13:48is the whole idea of sentient, you know, is that 13:50whenever you start adding in different personalities, you know, if, 13:53if you're a media company and you want to, you 14:02another thread right. If we wanted to pull in on 14:05it. But these large context and prompts combined with a 14:09lot of these other like you know, if you abstract 14:12those internals, you know that Chris was mentioning away from 14:14the user. So it's very simple, you know, it really 14:16is very powerful. Yeah. Like the Comet browser for example, 14:20you know, you can do a lot of that. I'm 14:22very excited. Right. About what's, what's in store. I also 14:25think it's important to keep in mind that the feedback 14:27Loop is very quick on prompt engineering for end users, 14:31but in this case I actually consider the prompt engineer 14:34to be, I'm sure a AIML team within KPMG that 14:38wrote that as the prompt for then others to just 14:41use in a more abstracted way. And so while a 14:45user might want prompt engineering for that super satisfying quick 14:48feedback on nudging a model to do the right thing, 14:52you know, the actual team building this agent might have 14:55a, you know, more longer term view of if I 14:58could actually make it simpler to do the prompt in 15:02the first place, then I could use this not just 15:04for tax, but they have other lines of businesses as 15:06well. So I think there's a difference in the level 15:09of patience and how much pain those two different groups 15:12would take on. And I actually think the group that 15:14probably actually wrote this 100 page prompt that then gets 15:18abstracted to the user might be pretty interested in the 15:22ways that they can make that prompt simpler and reusable 15:26over the fact that more prompt engineering is just going 15:29to get them quick results. The other thing that's in 15:32my mind is like I remember in the article that 15:34it was like from 2024 and I just worry that 15:38like needle in the haystack stuff hadn't really been figured 15:41out properly in 2024. So I'm just wondering if like, 15:46oh, it's 100 pages but actually the model's probably just 15:49looking at the beginning and at the end and then 15:51ignoring everything in the middle anyway and they're just like 15:53tacking stuff on at the end and going please work, 15:56please work. And I don't know, but I mean, I 15:59suspect these days it's going to work a lot better. 16:01Right. Because the models have been tuned to handle needle 16:03in the haystack stuff a lot better. But, but 2024, 16:06I think it's probably quite impressive. Yeah, I mean, I 16:09mean these models can handle up to like what's like 16:11128,000 tokens, right. I mean that's big. Right. And they're 16:15getting bigger and bigger, you know, you know, it's, you 16:18can take an entire book, get it summarized, you know. 16:21You know, that's why I jokingly said at the beginning, 16:22you know, I don't need these Cliff Notes anymore because 16:24I can get a model to summarize an entire book. 16:26Yeah. And I think we will see that as basically 16:29just like as the window gets bigger and bigger, you 16:31can just, you know, completely zero brain cell, just put 16:34the entire thing in and just see what happens. All 16:40right, I'm going to Move us on to our next 16:41topic. So the next story we want to cover today 16:45was sort of an interesting comment, really an offhand comment 16:49from OpenAI's CFO that got like a lot of play 16:53online. And I think it's a pretty interesting one. I 16:55kind of want to explain, talk through especially for our 16:58listeners, like why it's happening. So basically OpenAI CF confirms 17:04this thing that they were thinking about. Not immediate, but 17:07maybe something that OpenAI might do down the line. And 17:10what they might do is basically get into the infrastructure 17:13game. So rather than going to a Google cloud platform 17:16or an AWS, you would simply get compute from OpenAI. 17:22And it's sort of an interesting thing because that's very 17:25different from what OpenAI's business model has been to date, 17:29which is basically selling access to its models. This would 17:32be it selling access to its underlying infrastructure, that building 17:35up. And this is partially inspired actually by Amazon, right 17:39where the model that gave rise to AWS was. Hey, 17:43we run all of this massive infrastructure for our e 17:45commerce business. Maybe we just rent that underlying infrastructure itself. 17:50So I guess, Lauren, maybe I'll turn it to you. 17:52Why would OpenAI want to do something like this? It 17:55kind of feels like in some ways like these computing 18:02off the massive pre training runs that get you a 18:05GPT. But it kind of sounds like here they're now 18:08saying, well, you know, maybe not immediately, but we wouldn't 18:11mind renting that to some people. It seems like kind 18:13of a change of direction, don't you think? So I 18:16could see this actually being like a foreshadowing to there 18:19being a market around secondhand GPUs or last season's GPUs. 18:26So we can take for granted that OpenAI has to 18:28use the latest GPUs to be competitive, like performance efficiency 18:32for research, for commercial offerings. And the release pace of 18:38that has been about every two years. So four years 18:42ago we had A1 hundreds, two years ago we had 18:45H1 hundreds. This year we have Blackwell. So every two 18:48years they have to refresh their whole fleet. Yet the 18:52actual lifespan of these GPUs is like five years. Could 18:57be sitting around years. Exactly, yeah. And they won't be 19:01good enough, you know, at year three for OpenAI to 19:04use in their research, but could be perfectly good enough 19:07for a customer who, you know, is running large scale 19:11inference workloads. So I could see it as a way 19:14to recoup that investment. Especially with the CEO saying that 19:18they could be making trillions of dollars in investment in 19:22more infrastructure that, you know, after two years they have 19:26to find a way to not use themselves, but figure 19:30out if there's customers who want them. Yeah, that's right. 19:33And Chris, I guess this is like, it's kind of 19:35remarkable because, yeah, Lauren, I read the article in very 19:37much the same way where I was like, oh, yeah, 19:40every time they build one of these big data centers, 19:41it's like the biggest data center that has ever data 19:44centered. And then kind of what they're saying here is 19:46like, but yeah, in 24 months it's going to be 19:48kind of obsolete for us and like we need to 19:50sell it to other people like Chris. I guess the 19:53pace of computing progress here is like kind of insane, 19:56right? Like that basically the cutting edge becomes not fit 19:59for purpose for these frontier model companies within the span 20:02of a year. Like there's, there's like, you know, it's 20:04like the time period here is like very, very small. 20:06Yeah, I think that's true. And I think it really 20:08just comes down to economics as you sort of say 20:12that, right? Which is if it's cheaper to run the 20:15latest GPU as opposed to an older version and you're 20:19going to be able to get your training runs done, 20:21then there is going to be a value. So you 20:23need to stay ahead in that sense. So I guess 20:26it makes sense to be able to rent that stuff 20:29out. But I mean, I don't know, I mean, don't 20:32want to rent out Sam Altman's grody, old unused GPUs. 20:36No, I want the shiny things, you know what I 20:39mean? But I think it makes a lot of sense, 20:44as you say, AWS does that and they need the 20:48GPUs. Probably even today they need it the most when 20:51they're doing the big training runs. But then inference is 20:54just taking over, over. And then as we start to 20:56look at what's happening in inference, right, the chips are 20:59kind of getting much, much smaller. They're specialized inference chips 21:03now. So you're not even using the Blackwell's H1 hundreds, 21:06arguably for inference, there's a lot of providers, if you 21:09think of things like Grok, for example, they're using specialized 21:12chips in that sense. So you're not even passing it 21:16off over there. So yeah, I mean, what are you 21:19going to do with that? And so even if you've 21:21got the latest Blackwell or whatever the next version after 21:23that is going to be, then when you're not doing 21:26the big training runs, then you've got spare capacity that 21:30you want to be able to sell off. And now 21:33that becomes great for us because if you want to 21:36know when to train in the next model, just have 21:38a look at the spot models and if there's none 21:40available, you know what's happening. Yeah, that's a good tip 21:44for the future. Aaron, I guess question for you is 21:46like, can OpenAI win in this space? Like tech are 21:50offering these kinds of services, they're going to be going 21:52up against some pretty big players. And I guess the 21:55kind of question is like, do you have confidence that 21:58OpenAI can just kind of flip a business like this 21:59on? Well, I guess my first thought when I was 22:04looking into this is today it seems as though OpenAI 22:06is very deeply dependent on Azure, right. And so for 22:11compute and even distribution of their models. But it seems 22:14as though in the long run OpenAI, they seem to 22:16be exploring building their own infrastructure which would then almost 22:19like rebalance, you know, this, this from a dependency that 22:24they have to a collaboration, maybe. A collaboration. Right. And, 22:28and so that, that I think strategically is what potentially 22:32could be happening here, right? Yeah. And, and is it 22:36a good thing? And, and can they do this? Well, 22:39I think so. I would just be careful because I 22:42know that they've also, you know, released, you know, or 22:46said that they want to have, you know, this consulting, 22:48you know, area, you know, where they're going to charge, 22:50what is it like $10 million per client, right, to 22:52help them, you know, user models. So I mean, that's 22:55a big area, focus area that they have to work 23:03that they have, you know, and they don't want to 23:05fragment themselves too much to get away from their bread 23:08and butter, you know, but if they can be successful 23:11and put it all together, then I think OpenAI can 23:13pull it off and it'd be a nice reorganization of 23:15the current business landsc landscape. Yeah, I think it's a 23:18good point. It's just like how much like, you know, 23:20every few months it feels like OpenAI is launching a 23:23new product line, maybe that's actually creating a bunch of 23:25spread. Lauren, was that you want to jump in? I 23:27was going to say, I think in terms of whether 23:29they can win, one question is could they win against 23:31other companies doing this? The other question is could they 23:35win against open source? So VLLM is very popular. Tensorrt 23:40LLM is also quite popular and those are really the 23:43core technologies you would need to set up your own 23:47deployment rather than just use a hosted API. And I 23:52think there'd have to be a really strong case around 24:03inference optimization, other innovations are very quickly showing up in 24:09those engines because it has a whole community's worth of 24:12contribution happening in like almost real time. Yeah. That's really 24:16interesting. We've talked about obviously the pressure that OpenAI has 24:19had on like the, the model side from open source. 24:22You're almost saying that this actually goes a level deeper. 24:23Right. Is like can it produce an inference stack and 24:27infrastructure business that's competitive with what's happening in open source? 24:31I hadn't really even thought about that. It's really interesting. 24:33Yeah. Yeah. I kind of wonder if this is a 24:34hedge, you know, because they just released their open weight 24:36models. Right. And, and so because they're sort of doing 24:39some of that work, if they can build out this 24:41specialized infrastructure that's better than anyone else, then perhaps this 24:46is where they think the market is going so that 24:49they can still remain financially solvent. Right. Yeah. Even as 24:52the bottle price comes down, you're trying to capture it 24:54on the infrastructure side. It's really interesting. Yeah. What I 24:58didn't see was their financials about how they were going 25:01to fund this trillion dollar investment to build their own 25:06data center. So I'd be interested to see some of 25:08that whenever it comes out. I'm going to move us 25:14on to our next topic today. It was very funny. 25:17We had prepared this segment all to focus on the 25:21ins and outs of a very detailed economic study that 25:25came out about jobs and AI and we will cover 25:27it on a future episode. But as so happens so 25:29frequently in the world of AI, Nano Banana launched and 25:35that obviously has taken up a lot more airtime in 25:37AI world and I think it's worth going into. So 25:40we're going to instead, rather than talking about AI economics 25:42and the labor market, we're going to talk about Nano 25:44Banana. Chris, I think you are one of the strongest 25:47advocates for switching out topics so we could talk about 25:49Nano Banana. I think the question for you is how 25:53big of a deal is this? It seems in some 25:56ways that it's kind of just like a toy. Right. 25:57Like you put an image in and swap a person 25:59out and all that kind of stuff. Talk to us 26:01a little bit about like what's going on beneath the 26:03hood and whether or not this is significant from a 26:06kind of research and technological capability standpoint. Okay. So I 26:09think the first thing to say is I think this 26:12is way more than a toy. This is by far 26:15the best image generation model that I've seen today. So. 26:20And even if we look at the benchmarks, you know, 26:24when we look at, and I'm not a big fan 26:25of benchmarks, as you know, but even when you look 26:28at those benchmarks, it is 200 sort of Elo points 26:31ahead of everything else. Right. So it is absolutely just 26:36killing it. And what is super cool about it is, 26:40to your point, Tim, is the quality from the model 26:44is great. The text capabilities of the model is great. 26:48So if you typically look at an image model, it'll 26:51mess up the text and all that side of things, 26:53and it doesn't look great. The quality is just amazing. 26:57And then to your point, it's like the ability to 27:01hold an image and then put that image in different 27:04spaces, maintain the physics, et cetera, is absolutely brilliant. So 27:08to your point, you can face swap, you can add 27:10a smile, you can make a change, you can put 27:12somebody in a different location. It all works great. In 27:14fact, if I want, if we can. Can I share 27:16my screen? Tim, Can I share my screen? I believe, 27:18yeah, the permissions are open if you want to share 27:20your screen. So for all of the wonderful users here, 27:24whilst I was supposed to be paying attention to the 27:27podcast, this is. What Chris is usually doing when he's 27:30listening to everybody else talk. Yeah, exactly. This is what 27:34I built instead. Women. Let's. Let's see. So we'll, we'll, 27:38we'll, we'll see here. Let's get rid of this. So 27:40I said, put Tim in a banana seat. There's Tim. 27:43Screenshotted from today's podcast. And then there's Tim, right? And 27:49he didn't look very happy. Oh, Sam in a banana 27:52suit. Make Tim happy in a banana suit. And there 27:55he's got a nice happy face. And I said, he's 27:58gonna be happy because he's in banana. He's in my 28:00Banana Beach. Miami Beach. Banana Beach. Very nice place. And 28:04then I said, oh, no, he needs a friend with 28:06him. And an apple suit. Sorry, Lauren, I didn't get 28:09your permission. There we go. And Tim and Lauren are 28:12in Miami beach, happy. So, I mean, we're joking around. 28:17I'll stop sharing my screen now. We're joking around. But 28:22the reality is that is fantastic. Getting any of the 28:27other models to be able to do to that quality, 28:29and it does all the Style transferences as you imagine 28:32now, if you start to think of the impact of 28:35that, everything from creation of YouTube thumbnails to image editing, 28:40filtering, all the sort of things that you would have 28:44typically done with kind of Photoshop type stuff, then you 28:47know, think of things like Canva for example, you would 28:50typically. I mean I use Canva a lot. What's going 28:53to happen there? Right, because you're going to start to 28:55be able to just use this, you know, out of 28:58the box from Google AI Studio. I honestly think it 29:01is phenomenal and I really think there's going to be 29:05a lot of people who've invested in image models really 29:08sort of starting to panic very quickly. And I do 29:10want to pick up on that point. You know Aaron, 29:13I think one of the narratives that has kind of 29:15played out in a very interesting way over the last 29:17year or so has been, I think had you asked 29:19me January 2025, you would have been like, who's leading 29:22in this AI space? Been like, Ah, you know, OpenAI 29:25anthropic. And then Google is like kind of at the 29:27very end of the list like oh man, they just 29:29do not have their act together but kind of announcement 29:32by announcement they really seem to be catching up in 29:34a really pretty significant way. And so I guess Aaron, 29:38I just wanted to kind of for you to reflect 29:39a little bit on, you know, do you think that 29:40this is like in some ways like this is kind 29:42of like Google really kind of like fighting for first 29:45now in some respects, particularly on the image side? Well, 29:47I never thought I'd see Lauren in an Apple suit, 29:49you know, that's for sure. Right. And I think that's 29:53pretty impressive. So I mean as far as what is, 29:58you know, sort of this jump forward, I really like 30:00this multi turn editing capability where it can remember and 30:05build upon prior instructions that you already gave it, you 30:08know, and that's an indicator of some kind of extended 30:12attention and memory capabilities within the model, you know, which 30:15sort of propels it up, right, the projection of some 30:19of the best image generation models. And I think some 30:22of the other pieces are this, you know, it has 30:27like up to 1 million tokens, right, that, that you 30:29can put in because you have to be able to 30:31put in like a text prompt and then also add 30:33in an adding an image. Right. And so all those 30:37things and then also seeing Tim in a banana suit, 30:40I think that definitely propels it up to the number 30:43one image generation system out there. Lauren, I guess we 30:47can use the opportunity, I think to have you on 30:49the show because I think you've already brought it up 30:51a couple times as the kind of ever present influence 30:54of Open Source on this space and it's certainly for 30:57language models and text. It feels like Open Source is 31:00kind of, you know, kind of like in the running 31:03for state of the art where you feel on kind 31:06of like open Source image and kind of other forms 31:08of media generation. Is that similar in the space from 31:10your perspective where like Open Source is really kind of 31:12catching up very quickly or is a place where, you 31:14know, the space is still lagging? Yeah, I think on 31:17the models front it's maybe not as important as on 31:21the inference engine and then even like the user interface 31:24front because you need all of those pieces to come 31:27together and the inference engines are typically more skewed towards 31:31text use cases. So I think even if the models 31:34are up to par, it's not the same as being 31:36able to go to that user interface that Chris just 31:39showed, which was probably free, or at least a free 31:43tier. There's not really the equivalent in open source. There's 31:46always going to be that element of, you know, DIY 31:49ness that you have to do to first find the 31:52model and then the model might not be as generalizable. 31:56There might be, you know, there are certainly if you 31:58look at hugging face, I think it's millions of models 32:00at this point. So you could find models that are 32:03good at specific tasks. Not so sure about as generalizable 32:08as what we just saw. Chris. The final question on 32:10this is, you know, it's the inevitable question, but we've 32:14been freaking out all the time for years at this 32:16point on how AI generated images are going to destroy 32:20our ability to know what's real and what's not. Have 32:23we finally crossed the threshold with nanobanana? This is like 32:25pretty good. Well, it's real anyway, Tim. We're all living 32:28in a simulation, so it's fine now. I actually think 32:35the progression here is really good. Right. So I actually 32:39think the fact that we've been seeing terrible image models 32:42for a while has been a very good thing and 32:44we've all got pretty good at spotting. I like the 32:47hands slightly off, you know? Yeah. So I think over 32:51the last few years we've kind of got used to 32:54it and we now know not to trust images, do 32:57you know what I mean? So I think if you 32:59think of all of the kind of the flux stuff 33:02with Black Forest last year, that was perfect example. We 33:05saw our politicians holding hands, doing whatever. Right. We've got 33:11used to it. We know not to trust these models 33:13and the outputs. I think the bigger thing in this 33:18case is making sure that we hold people accountable for 33:24the models that they can create and making sure that 33:27the safety elements of those models are high. Because there's 33:34a good side. People like me who couldn't create a 33:37thumbnail, great, I'm now going to be able to create 33:38decent thumbnails for my YouTube channel plug. But for others, 33:44it means they're going to lose business in that sense. 33:47And then there's a whole lot of scary scenarios there. 33:49So I think there is still a lot of the 33:51ethical side that needs to be worked out. But the 33:55quality is great and it's just going to get better. 33:59Right. And actually, one of the things I would say 34:01is we're seeing this right now for image. We can 34:04guarantee if we project forward 12 to 18 months, you're 34:08going to see the same level of quality on video, 34:10you're going to see the same level of quality on 34:12audio as well. So this is just going to extend 34:14out across modalities. And I'll add too, that I think 34:18this being such an editing focused model, editing has kind 34:21of become a bad word because it, you know, editing 34:24means manipulation, means like malicious intent. But there are really 34:28important use cases for editing. So like with the geospatial 34:33models that we built with NASA, one of the biggest 34:36struggles is cloud cover. You know that most of the 34:39images, satellite imagery, has cloud cover, so you can't do 34:43anything with that. And if you could do actually synthetically 34:47generated data using an editing model to then improve your 34:52dataset, to train a foundation model, that's an editing use 34:56case. And it's, you know, it's not about manipulation or 34:58changing the meaning of something from a human perspective. It's 35:02more for a machine learning perspective. That's right, sure. And 35:05just being able to see Tim smile as well is 35:07really important. As you know, I never smile on these 35:10shows, so. All right, last topic of the day, Aaron. 35:18It's always a joke that when we bring you onto 35:20the show, we're going to talk about sports. And I'm 35:22not going to let us break the tradition here on 35:24episode 70. So you've been covering the US Open, and 35:28I think the team's been doing some interesting experiments. And 35:30so we've been doing a lot of screen sharing on 35:32this episode. I believe you want to kind of share 35:33some of the stuff that you've been doing as well. 35:35Yeah. So I mean, I mean, first, you know, you 35:37know, if I could just give a prelude as far 35:38as, you know, what we're doing. So I mean we've 35:40been with the US Open for over 30 years and 35:44there's about a million fans that show up and attend 35:48the Flushing Meadows site. And then every single day about 35:52there's another 14 million fans that tune in through our 35:56digital properties. And what we've done is we've been the 36:01hallmarks of the US Open is where we want to 36:03combine the fan experience with technology so that we can 36:07bring in people, expand the swath of what we're doing. 36:10And we've introduced three new features this year. So one 36:14of them is a match chat. So we spent several 36:17months building this very impressive system and we'll have a 36:21few papers out that describe the science behind it. But 36:25what this is, it's a real time sort of agent 36:27driven assistant so that you can go in and ask 36:30a question about a match, about players in real time, 36:34large scale and get a response back. And then the 36:38second piece is called Key Points. So we always say 36:42too long, didn't read tldr, right. You see that a 36:45lot. But there's these very long articles that people just 36:48don't have the time to read. And so we summarize 36:50it and then we show those bullet points on top 36:53of these articles and we have a workflow of which 36:55we work with USTA editors. And then the third one 36:59is called Live Likelihood to Win. This has a very 37:02long historical background, but we combine predictive modeling so we 37:07have an ensemble of different predictive modeling that then go 37:11into who's going to win, right. The match, we have 37:14a pre match prediction and then as the match goes 37:16on we have some proprietary equations that we develop that 37:19then fine tune and change the odds that somebody's going 37:22to win given these momentum. But ultimately what we want 37:25to do is increase the breadth and depth of fans 37:28and give them the information that they need so that 37:30they can understand the story of a match. And what 37:33I was hoping to do was continue this trend of 37:36experimentation of screen share and just show you some of 37:39the work here that we've done. It's live right now 37:42and play starts pretty soon. So it's 10:47 right now, 37:48so it starts at around 11:00 clock and we can 37:51go ahead and see some of the action. So just 37:54to orient you, this is our work that we put 38:03right, that's a twin, right. Of this. And I want 38:06to just quickly show you when A user comes in. 38:09What's one of the first things they want to know? 38:11Well, they want to know the scores of a match. 38:14And so I would like to highlight two matches. One 38:18of them was a big upset. So her name is 38:21Ila and she's a 20 year old from the Philippines 38:26and she beat. Right. Tucson. Right. So that, that's one. 38:31Right. And then another is the Alcaraz match that I 38:34also want to show you. And let's check out the 38:36Alcaraz match here. So what you do is you. Because 38:41this is already finished. Right? But imagine, you know, plays 38:44going on and there's a match that's going on, which, 38:46which you can check later in the day. But let's 38:48check out the match recap. So we have, you know, 38:50the IBM Slam tracker and it pops up and you 38:54can quickly see on the sidecar that we have the 38:58first tile would be the score here. And then when 39:02you go down, we have this 360 degree storytelling of 39:05the match. And if you want to know beforehand, if 39:08the match hadn't started, what's the likelihood that Alcaraz is 39:11going to win? Well, it's pretty high in this case. 39:13This is a very early round. So this is what 39:17round two. And Alcaraz is off to a strong start. 39:21But this is what we've assigned that Alcaraz has an 39:2382% chance of winning. And again, this uses pure predictive 39:28modeling that we've experimented with over years and years. Now, 39:32because the match is over, we can go to the 39:34summary tab. Right. And you can see the live likelihood 39:39to win. You know, how it's changed over time. Right. 39:42And there weren't very many fluctuations in this one because 39:45Alcaraz, you know, had a very big, you know, advantage 39:49whenever he came in. Right. But now if I want 39:53to know some details. So this is Matt's chat. This 40:04And so let's just click it and it opens up 40:07and we have a frictionless user experience that we've designed, 40:10you know, you know, so that we can sort of 40:13guide the user and to help them get the information 40:15that matters the most. And we did a lot of 40:17user studying, we did a lot of data analytics to 40:19figure out what do people care about. So let's check 40:23out. I find match stats very interesting. So let's just 40:27ask a question. Let's say how many aces did. And 40:33let's put a player that isn't even in this match, 40:35that's center half. And so let's do this first. So 40:39it's thinking, it's going through and it's hitting the pieces. 40:44And so what it first says is, wait a minute, 40:46do you want to know set by set, or do 40:48you want to know about the match? And let's hit. 40:50No, Right, So because I want to know about the 40:51match here. And so now it's thinking again, analyzing, and 40:55this is going out right, real time. Right now it's 40:58hitting our middleware going out into aws. And what it 41:02does is it then in turn comes back and tells 41:05us how many, you know, servers did Alcaraz have. Right? 41:08And it worked well because it was able to switch 41:12right center right into the right players that it, that 41:17it has. So it automatically does a lot of the 41:19detection. So we have a lot of hat pipelines, and 41:22then it does pronoun corrections, it does player corrections and 41:25so on. Right. But you can play with this more 41:29as you go through and see what all we've built. 41:34But it's very interesting and there's a lot of deep 41:36statistics that come in. And so if we were to 41:41keep going, then you can see lots of stats that 41:44people really want to know about. But in the interest 41:46of time, let's just go back and close this. Right? 41:51And you know, why don't you. You pick a match 41:54here on the screen rather than me picking a match. 41:57Tim, let's do the Harris versus Fritz there on the 42:00bottom. Right? On the bottom. Right. So, okay, this one 42:03here. All right. So, you know, here, the pre likelihood 42:07to win. Let's check that out. I mean, Fritz was 42:10overwhelmingly the favorite, right. And so because of that, whenever 42:14you go and look at the live likelihood to win, 42:17if we trace it with the actual match, you can 42:20see that Fritz lost the first set. So his odds 42:24of winning it goes down, but not that much because 42:28he's still favored so heavily. And then the storytelling keeps 42:32going on where it's a very close one. It gives 42:34him the break points in the second set. And because 42:37Fritz wins well, I think he's regaining the momentum. And 42:43then the match continues, right. We're finally set for, you 42:46know, he. He eventually takes that over. So this live 42:49likelihood to win is really powerful during the match. It's 42:52itself because you can track and trace how that works. 42:56You know, so that's. That. That in essence is. Is 42:58what I really wanted to show, you know, some of 42:59the exciting work. Right. That's live right now. And then 43:03a plugin, you know, for ESPN Fantasy Football. We went 43:06live with a few other pieces yesterday and then next 43:09week on Wednesday, we're going to have another piece that's 43:12live. But if you're part of a fantasy football team, 43:15go and check out our player insights and factors that 43:19we have and grades and so on. That's great, Aaron. 43:21Awesome. Well, we'll keep you posted. And for all you 43:24listeners, we'll keep you posted. And I guess, Aaron, as 43:26this continues to develop out, we'll have you back. I 43:29think it's fun having you on the show regularly because 43:31it feels like we get to see the iteration every 43:32time you come back on. And so it's cool seeing 43:35that happening. Cool. Yeah. Awesome. Well, that's all the time 43:38that we have for today, so thanks for joining us, 43:41Aaron, Lauren, Chris, it was a pleasure always to have 43:44you on the show. And thanks to all you listeners. 43:46If you enjoyed what you heard, you can get us 43:47on Apple Podcasts, Spotify and podcast platforms everywhere. And we 43:51will get you next week on Mixture of Experts.