Mixture of Experts: AI News & Breakthroughs
Key Points
- The host touts a new image‑generation model as far ahead of competitors, beating benchmark scores by roughly 200 points and marking it as the most impressive system they’ve seen.
- This week’s “Mixture of Experts” episode brings back IBM fellow Aaron Botman and engineer Chris Hay, and introduces newcomer Lauren McHugh, while previewing topics such as OpenAI’s potential infrastructure sales, a “nano‑banana” reference, the US Open, and KPMG’s 100‑page AI prompts.
- In the news roundup, NVIDIA posted a 56% jump in data‑center sales year‑over‑year, yet missed analyst revenue expectations, prompting a mixed market reaction.
- OpenAI and Anthropic announced a joint effort to probe model security—especially hallucinations—while 911 centers are trialling AI to handle low‑priority calls, and IBM with NASA released the open‑source “Suria” foundation model for forecasting solar storms.
Sections
- Mixture of Experts AI Panel - Host Tim Huang opens the weekly Mixture of Experts podcast, lauds a breakthrough image‑generation model, and introduces veteran guests alongside newcomer Lauren McHugh to discuss a range of AI topics—from OpenAI’s potential infrastructure sales and KPMG’s 100‑page prompts to Nvidia’s market‑cap dominance and other recent headlines.
- KPMG's Taxbot Uses 100-Page Prompt - The hosts highlight KPMG's Taxbot, an AI tax advisory tool that relies on an unusually massive 100‑page prompt, sparking talk about the scale of prompts in modern AI applications.
- Fine‑Tuning vs Long Prompts in Tax Bots - The speakers debate whether the need for extensive prompts in a tax‑domain agent arises from model design and fine‑tuning considerations or from custom, unscalable solutions, questioning how much of the source material must be rewritten for new use cases.
- Challenging the “Agent” Terminology - The speaker argues that labeling a large prompt‑based system as an “agent” is misleading, insisting that a true agent should dynamically retrieve and assemble information—especially for real‑time data—rather than relying on an unwieldy hundred‑page prompt.
- Feedback Loops and Prompt Engineering - The speakers debate how the satisfying feedback loop of advanced prompting creates alignment challenges, citing accountant use‑cases, debiasing, sentient‑personality risks, and corporate prompt‑engineering teams like KPMG’s AIML group.
- OpenAI May Launch Its Own Cloud - The hosts note AI’s ability to replace traditional summaries, then explain the OpenAI CFO’s offhand remark that the company could soon sell its own compute infrastructure rather than relying on services like Google Cloud or AWS.
- Accelerating GPU Obsolescence and Leasing - The speakers discuss how rapidly cutting‑edge GPUs become outdated, prompting firms to rent newer hardware for training, sell or lease older units, and follow AWS’s model as the industry shifts toward inference workloads.
- Betting on Proprietary AI Infrastructure - The speakers speculate that building a competitive inference stack and dedicated data‑center infrastructure serves as a hedge against open‑weight models, raising questions about the massive financing required and its market impact.
- AI Image Demo Sparks Industry Shift - A playful AI‑generated visual demo transitions into a broader commentary on how advanced style‑transfer models will upheave traditional image‑editing tools such as Photoshop and Canva.
- Navigating Trust in AI Images - The speakers discuss how widespread AI image models have become, noting that increased exposure has sharpened public skepticism, and they stress the importance of holding creators accountable and ensuring safety to mitigate misinformation.
- US Open Digital Fan Innovations - The host and guest review three new technology-driven features—including a real‑time match chat agent and a “Key Points” TL;DR summary—designed to enhance the experience for millions of on‑site and online US Open fans.
- Alcaraz Match Analytics Demo - The speaker demonstrates the IBM Slam Tracker’s capabilities—showing scores, 360° storytelling, predictive win probability, and live likelihood visualizations—using a recent Alcaraz tennis match.
- Live Win Probability Demonstration - The speaker walks through a real-time tennis match illustration, showing how the live likelihood-to-win metric tracks shifting odds and momentum, while also highlighting upcoming fantasy football player insight tools.
Full Transcript
# Mixture of Experts: AI News & Breakthroughs **Source:** [https://www.youtube.com/watch?v=zw0Ymg_DoEs](https://www.youtube.com/watch?v=zw0Ymg_DoEs) **Duration:** 00:43:53 ## Summary - The host touts a new image‑generation model as far ahead of competitors, beating benchmark scores by roughly 200 points and marking it as the most impressive system they’ve seen. - This week’s “Mixture of Experts” episode brings back IBM fellow Aaron Botman and engineer Chris Hay, and introduces newcomer Lauren McHugh, while previewing topics such as OpenAI’s potential infrastructure sales, a “nano‑banana” reference, the US Open, and KPMG’s 100‑page AI prompts. - In the news roundup, NVIDIA posted a 56% jump in data‑center sales year‑over‑year, yet missed analyst revenue expectations, prompting a mixed market reaction. - OpenAI and Anthropic announced a joint effort to probe model security—especially hallucinations—while 911 centers are trialling AI to handle low‑priority calls, and IBM with NASA released the open‑source “Suria” foundation model for forecasting solar storms. ## Sections - [00:00:00](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=0s) **Mixture of Experts AI Panel** - Host Tim Huang opens the weekly Mixture of Experts podcast, lauds a breakthrough image‑generation model, and introduces veteran guests alongside newcomer Lauren McHugh to discuss a range of AI topics—from OpenAI’s potential infrastructure sales and KPMG’s 100‑page prompts to Nvidia’s market‑cap dominance and other recent headlines. - [00:03:05](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=185s) **KPMG's Taxbot Uses 100-Page Prompt** - The hosts highlight KPMG's Taxbot, an AI tax advisory tool that relies on an unusually massive 100‑page prompt, sparking talk about the scale of prompts in modern AI applications. - [00:06:34](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=394s) **Fine‑Tuning vs Long Prompts in Tax Bots** - The speakers debate whether the need for extensive prompts in a tax‑domain agent arises from model design and fine‑tuning considerations or from custom, unscalable solutions, questioning how much of the source material must be rewritten for new use cases. - [00:09:56](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=596s) **Challenging the “Agent” Terminology** - The speaker argues that labeling a large prompt‑based system as an “agent” is misleading, insisting that a true agent should dynamically retrieve and assemble information—especially for real‑time data—rather than relying on an unwieldy hundred‑page prompt. - [00:13:12](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=792s) **Feedback Loops and Prompt Engineering** - The speakers debate how the satisfying feedback loop of advanced prompting creates alignment challenges, citing accountant use‑cases, debiasing, sentient‑personality risks, and corporate prompt‑engineering teams like KPMG’s AIML group. - [00:16:21](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=981s) **OpenAI May Launch Its Own Cloud** - The hosts note AI’s ability to replace traditional summaries, then explain the OpenAI CFO’s offhand remark that the company could soon sell its own compute infrastructure rather than relying on services like Google Cloud or AWS. - [00:19:33](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1173s) **Accelerating GPU Obsolescence and Leasing** - The speakers discuss how rapidly cutting‑edge GPUs become outdated, prompting firms to rent newer hardware for training, sell or lease older units, and follow AWS’s model as the industry shifts toward inference workloads. - [00:24:22](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1462s) **Betting on Proprietary AI Infrastructure** - The speakers speculate that building a competitive inference stack and dedicated data‑center infrastructure serves as a hedge against open‑weight models, raising questions about the massive financing required and its market impact. - [00:27:43](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1663s) **AI Image Demo Sparks Industry Shift** - A playful AI‑generated visual demo transitions into a broader commentary on how advanced style‑transfer models will upheave traditional image‑editing tools such as Photoshop and Canva. - [00:31:56](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=1916s) **Navigating Trust in AI Images** - The speakers discuss how widespread AI image models have become, noting that increased exposure has sharpened public skepticism, and they stress the importance of holding creators accountable and ensuring safety to mitigate misinformation. - [00:35:18](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2118s) **US Open Digital Fan Innovations** - The host and guest review three new technology-driven features—including a real‑time match chat agent and a “Key Points” TL;DR summary—designed to enhance the experience for millions of on‑site and online US Open fans. - [00:38:31](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2311s) **Alcaraz Match Analytics Demo** - The speaker demonstrates the IBM Slam Tracker’s capabilities—showing scores, 360° storytelling, predictive win probability, and live likelihood visualizations—using a recent Alcaraz tennis match. - [00:41:51](https://www.youtube.com/watch?v=zw0Ymg_DoEs&t=2511s) **Live Win Probability Demonstration** - The speaker walks through a real-time tennis match illustration, showing how the live likelihood-to-win metric tracks shifting odds and momentum, while also highlighting upcoming fantasy football player insight tools. ## Full Transcript
I think this is way more than a toy. This
is by far the best image generation model that I've
seen today. And even if we look at the benchmarks,
you know, when we look at and I'm not a
big fan of benchmarks as you know, but even when
you look at those benchmarks, it is 200 sort of
hello points ahead of everything else. All that and more
on today's Mixture of Experts. I'm Tim Huang and welcome
to Mixture of Experts. Each week MOE brings together a
panel of people pushing the frontiers of technology to discuss,
debate and analyze their way through the wildly fast paced
world of artificial intelligence. Today I'm joined by a great
crew of veterans and also someone joining for the very
first time. I've got Aaron Botman, IBM fellow master inventor.
Aaron, welcome back to the show. Chris Hay, distinguished engineer
and longtime MOE veteran. And joining us for the very
first time is Lauren McHugh, Program Director, AI Open Innovation.
Lauren, welcome to the show. Thank you. So we've got
a packed episode today. We're going to talk about OpenAI
hinting that they might sell infrastructure, nano, banana, the US
Open. We're going to even talk about 100 page prompts
coming out of KPMG. But first, as always, we've got
our new segment from Ili. So Ili over to you.
Hey everyone, I'm Ili McConnen, I'm a tech news writer
with IBM. Think I'm here now with a few AI
headlines you may have missed this busy week. First up,
Nvidia, the world's most valuable company by market cap, reported
a whopping 56% increase in sales over the same period
from last year. And this was largely driven by its
data center business. This would seem like good news for
the chip maker, right? In fact, market reaction was mixed
because the revenues did not meet analysts expectations. Next up,
OpenAI and Anthropic, two of the biggest rivals in artificial
intelligence have actually teamed up to better understand the security
issues facing models. They recently evaluated each other's models in
order to better understand hallucinations and other issues, basically hoping
to catch what their own tests had missed. Meanwhile, in
the category of hopeful AI, many 911 centers are so
understaffed that they're turning to AI to help them out.
This may seem problematic on first blush, but actually these
AI agents are helping with parking violations, noise complaints, basically
non urgent issues so that the human AI staffers can
deal with the real emergencies. Last but not least, I,
IBM and NASA are helping give scientists more time to
prepare before big Storms hit. They recently released a new
open source foundation model called Suria that can predict solar
our Think newsletter. The link is in the show. Not.
So Normally here at moe, we cover some of the
biggest stories happening in AI technology. You know, the drops
of all the largest models coming out of the frontier
model companies, the biggest features and products that people are
launching. But I actually want to start today with kind
of a funny smaller story. There was an article that
was written about kpmg, the kind of global accounting firm
which kind of launched their own, as many companies and
enterprise are doing now, their own AI agent, which they
call Taxbot. And what Tax Bot is attempting to do
is gather together all of the tax advice expertise across
a big firm like KPMG and essentially strip through documents
and generate sort of 25 page kind of advisory opinions
for their customers that basically are like kind of the
first draft of what they would typically provide for a
this was the really funny thing. They took a lot
of flack, I don't know if flack's the right word,
but they got a lot of attention online because they
sort of revealed that in order to power Taxbot, they
had a hundred page prompt running behind this, which I
think just as someone who, you know, kind of comes
from a world where prompting is like a few sentences,
this is like really remarkable. And so maybe Aaron, I'll
start with you, is like what's the longest prompt you've
ever written? And is it kind of surprising to see
like 100 page prompts, like novel length novella life prompts
coming out? Yeah. Well, first I gotta say, you know,
growing up and this, this might, you know, give you
my age a little bit, but I used to use
these yellow books called Cliff Notes, you know, where, where
I could go pick it up from like Barnes and
Noble, right. Or, or even buy it from Amazon and,
and get Cliff Notes about a book. Well, we certainly
don't need those anymore, right, because we can use these
large prompts, right. To summarize. But the largest prompt that
I've ever written, I would say semi written, right. Because
I just copy paste a manual right into the context.
So it was probably about 40 pages, right. That I
inputted into a model and, and it came out with
key points that were summarized. So it was very effective
and really interesting how it worked. And it surprised Me
and I think. It'S kind of the interesting thing. And
I guess maybe Lauren, do you want to jump in
on this? I know a narrative that was very prominent
maybe a year and a half ago, two years ago,
a long time ago in AI time was like, prompt
engineering is going to be dead. Over the long run,
we're not going to really need prompt engineering. You're going
to just tell the computer what you want and it
will do it. This kind of story almost points us
in a different direction, right? It's like almost a world
in which in order to get agent behavior to be
really, really good, there's going to have to be like
a lot of specification. And in some sense prompt engineering
is becoming a bigger part of what gets these things
to work. Is that the right way of thinking about
it? Did it turn out that prompt engineering was not
actually dead? I think a good way to appreciate how
complex a prompt would be would be to look at
some of the open source projects that, that are essentially
agents, so GPT researcher, meta GPT. You can see how
long and complex those prompts are. And that's been, you
know, a whole community's worth of contribution of ideas of
how to make the agent work better. I do think
that, you know, if a product requires 100 page user
manual to work that, you know, at best it's poorly
designed, at worst it's broken. And in this case the
product is the model, the user manual is the prompt.
So one thing that you could do is actually fine
tune it. I think fine tuning probably is making a
comeback, especially with some of the models like Gemma 3,
2, 70 million, where the actual architecture is made to
be tuned. More parameters allocated to the embedding versus parameters
allocated to the transformer blocks to do the processing. Part
of it I think is like it actually goes to
Aaron's original example, which is, I think what's prompting exactly.
I think prompting sometimes can just mean the input to
the model, in which case it's maybe no surprise that
you put in a whole manual to try to get
it. To summarize here, I'm curious if there's a reason
why these prompts need to get super long in the
tax domain. Is there something about agents that require us
to have longer prompts or do you think this is
just kind of a weird artifact of how they design
the designs, this tax bot? Basically I think it like
my main question would be out of those 100 pages,
how many of those pages need to be rewritten for
a new use case that, you know, that Only KPMG
knows. And that would get to the heart of how
much of this is, you know, because it's an agent
and any agent would need to do that. And then
how much of this is truly a custom solution which
is then a lot harder to scale. Yeah, there's a
really interesting dynamic there. And I guess maybe, Chris, I'll
bring you in on this. I think, Lauren, what I
hear you sort of saying is that in some ways
you have these really long prompts to make up for
all the knowledge that the model doesn't know. And so
I guess, Chris, maybe there's one point of view is
that as these models get deployed in more and more
specialized domains, it's not going to be atypical to see
really, really long prompts emerge. Right. Because in effect, there's
all this domain knowledge that like a general model might
not have. I suppose there's the original idea that the
base model would just get smart enough that you wouldn't
have to do that. But I mean, if this is
a good example, we may not be headed in that
direction. I'm not surprised though. I think 100 pages, I
mean, as long as 99 of those pages aren't do
not hallucinate. Do not hallucinate and repeat, then it's. Actually
like the Shining. It's just the same sentence over and
over and over again. Exactly. But to Lauren's point, right,
if the model doesn't have the knowledge in the first
place and you've got a lot of specialist domain, then
you're going to have to put that in the context.
And I'm not against it because we've all been using
retrieval augmented generation for a while. And if you really
into your context anyway. So in some regards, how is
that any different, really, what you're saying is actually I
can fit everything that I need into the context window
and therefore the model's going to stand a better chance.
And you know, and I, I have to admit, I
would probably rather have it in the context window than,
you know, sort of rolling the dice at RAG and
hoping that it gets the right chunk coming back so
it goes either way. But yeah, you are making up
for lack of knowledge or there's certain patterns that you
want it to do. I need it. If you're generating
a 25 page document and that 25 page document's got
to look in an exact way to Aaron's earlier point,
you're building a specification and the model is not a
mind reader. It's got to produce it in the way
that you want. And a good prompt is going to
have examples. This is section one, this is section two.
I want you to do this. Do not talk about
it. Fine tuning is really hard. So, you know, again,
if you can stuff it in the context, that's fine.
What I would probably say is I think this is,
and I probably challenge their use of agent in this
case. I suspect it's just a prompt, but I think
if it was truly, truly agentic, I would argue that
the agent would be able to go round and round
the loop a few times. And I don't think you
would need 100 page prompt in that sense. And you
could have the agent pull the elements that it needs
and then bring together that structure in a way. Now
in reality it probably ends up still about a hundred
pages, but I think it's rather than stuffing into the
context window, you're having the agent go search and then
bring everything together. So I challenge the word agent in
this case, you know, but yeah, yeah. Yep, yeah, I
wanted to just jump in and make two points if
I could very quickly, you know, as to why you
would want to use a hundred page prompt. So to
me it seems as though these real time systems, right,
because the data is updated in real time, it's never
going to be within the knowledge base or the foundational
model that you have. So if you think the stock
market and you want to ask questions about what's happening
today or this minute, then you need to get that
information within a prompt. And you can get a lot
of information very quickly within a prompt and then perhaps
you even add in a Persona. Right. And then the
second use case that I was thinking about is why
you would want to use this kind of a prompt
even if the data were in the foundational model is
if you think of like a flashlight, when you put
content within a prompt, you're telling the system focus in
on this type of data rather than hoping and rolling
the dice like Chris mentioned, that you're going to get
that information back as a result. But I would pair
that with like an Alura technique where you can determine
what the attention mechanism needs to focus in on. Right?
So if you pair the Alura technique with a large
prompt, then I think in turn you'll be able to
really take that flashlight when you're in the dark and
light up exactly what you're looking for. And Aaron, how
do you plan to explain that to an accountant to
do as opposed to copy and pasting into a prompt?
Yeah, so in accounting it's important I think because lots
of these different types of rules and regulations, it changes
really quickly. And so if they're trying like tax in
this case, I think it's important to. I don't exactly
know what was in the prompt. Right. In this 100
page prompt, but I would hope you know that it
was mostly about rules, regulations so that they could better
understand and advise maybe somebody around what's happening. But in
that tax, that's what I would think and sort of
help out a tax auditor. That's one of the reasons
I think prompting is kind of, it's unbeaten even though
it's considered a little bit of a cheap way of
doing things for people who are much more in the
machine learning world is for an accountant, like they're not
going to go through some fine tuning process. They would
much rather just type stuff in and see things happen.
And so it's really hard to beat the fact that
the feedback loop I'm prompting is just so satisfying in
a way that is really hard for other methods of
AI alignment to basically work. Chris, do you want to
jump in there? No, I agree with you 100% but
I would love to see some accountants sitting there going
why should I be using Qlora here? How am I
going to debias my data set here or you're an
X expert tax advisor. Please use Australian language in the
response. Do not elucidate. Here's. Here are the tax codes.
That's. Yeah, yeah. I think what's kind of scary too
is the whole idea of sentient, you know, is that
whenever you start adding in different personalities, you know, if,
if you're a media company and you want to, you
another thread right. If we wanted to pull in on
it. But these large context and prompts combined with a
lot of these other like you know, if you abstract
those internals, you know that Chris was mentioning away from
the user. So it's very simple, you know, it really
is very powerful. Yeah. Like the Comet browser for example,
you know, you can do a lot of that. I'm
very excited. Right. About what's, what's in store. I also
think it's important to keep in mind that the feedback
Loop is very quick on prompt engineering for end users,
but in this case I actually consider the prompt engineer
to be, I'm sure a AIML team within KPMG that
wrote that as the prompt for then others to just
use in a more abstracted way. And so while a
user might want prompt engineering for that super satisfying quick
feedback on nudging a model to do the right thing,
you know, the actual team building this agent might have
a, you know, more longer term view of if I
could actually make it simpler to do the prompt in
the first place, then I could use this not just
for tax, but they have other lines of businesses as
well. So I think there's a difference in the level
of patience and how much pain those two different groups
would take on. And I actually think the group that
probably actually wrote this 100 page prompt that then gets
abstracted to the user might be pretty interested in the
ways that they can make that prompt simpler and reusable
over the fact that more prompt engineering is just going
to get them quick results. The other thing that's in
my mind is like I remember in the article that
it was like from 2024 and I just worry that
like needle in the haystack stuff hadn't really been figured
out properly in 2024. So I'm just wondering if like,
oh, it's 100 pages but actually the model's probably just
looking at the beginning and at the end and then
ignoring everything in the middle anyway and they're just like
tacking stuff on at the end and going please work,
please work. And I don't know, but I mean, I
suspect these days it's going to work a lot better.
Right. Because the models have been tuned to handle needle
in the haystack stuff a lot better. But, but 2024,
I think it's probably quite impressive. Yeah, I mean, I
mean these models can handle up to like what's like
128,000 tokens, right. I mean that's big. Right. And they're
getting bigger and bigger, you know, you know, it's, you
can take an entire book, get it summarized, you know.
You know, that's why I jokingly said at the beginning,
you know, I don't need these Cliff Notes anymore because
I can get a model to summarize an entire book.
Yeah. And I think we will see that as basically
just like as the window gets bigger and bigger, you
can just, you know, completely zero brain cell, just put
the entire thing in and just see what happens. All
right, I'm going to Move us on to our next
topic. So the next story we want to cover today
was sort of an interesting comment, really an offhand comment
from OpenAI's CFO that got like a lot of play
online. And I think it's a pretty interesting one. I
kind of want to explain, talk through especially for our
listeners, like why it's happening. So basically OpenAI CF confirms
this thing that they were thinking about. Not immediate, but
maybe something that OpenAI might do down the line. And
what they might do is basically get into the infrastructure
game. So rather than going to a Google cloud platform
or an AWS, you would simply get compute from OpenAI.
And it's sort of an interesting thing because that's very
different from what OpenAI's business model has been to date,
which is basically selling access to its models. This would
be it selling access to its underlying infrastructure, that building
up. And this is partially inspired actually by Amazon, right
where the model that gave rise to AWS was. Hey,
we run all of this massive infrastructure for our e
commerce business. Maybe we just rent that underlying infrastructure itself.
So I guess, Lauren, maybe I'll turn it to you.
Why would OpenAI want to do something like this? It
kind of feels like in some ways like these computing
off the massive pre training runs that get you a
GPT. But it kind of sounds like here they're now
saying, well, you know, maybe not immediately, but we wouldn't
mind renting that to some people. It seems like kind
of a change of direction, don't you think? So I
could see this actually being like a foreshadowing to there
being a market around secondhand GPUs or last season's GPUs.
So we can take for granted that OpenAI has to
use the latest GPUs to be competitive, like performance efficiency
for research, for commercial offerings. And the release pace of
that has been about every two years. So four years
ago we had A1 hundreds, two years ago we had
H1 hundreds. This year we have Blackwell. So every two
years they have to refresh their whole fleet. Yet the
actual lifespan of these GPUs is like five years. Could
be sitting around years. Exactly, yeah. And they won't be
good enough, you know, at year three for OpenAI to
use in their research, but could be perfectly good enough
for a customer who, you know, is running large scale
inference workloads. So I could see it as a way
to recoup that investment. Especially with the CEO saying that
they could be making trillions of dollars in investment in
more infrastructure that, you know, after two years they have
to find a way to not use themselves, but figure
out if there's customers who want them. Yeah, that's right.
And Chris, I guess this is like, it's kind of
remarkable because, yeah, Lauren, I read the article in very
much the same way where I was like, oh, yeah,
every time they build one of these big data centers,
it's like the biggest data center that has ever data
centered. And then kind of what they're saying here is
like, but yeah, in 24 months it's going to be
kind of obsolete for us and like we need to
sell it to other people like Chris. I guess the
pace of computing progress here is like kind of insane,
right? Like that basically the cutting edge becomes not fit
for purpose for these frontier model companies within the span
of a year. Like there's, there's like, you know, it's
like the time period here is like very, very small.
Yeah, I think that's true. And I think it really
just comes down to economics as you sort of say
that, right? Which is if it's cheaper to run the
latest GPU as opposed to an older version and you're
going to be able to get your training runs done,
then there is going to be a value. So you
need to stay ahead in that sense. So I guess
it makes sense to be able to rent that stuff
out. But I mean, I don't know, I mean, don't
want to rent out Sam Altman's grody, old unused GPUs.
No, I want the shiny things, you know what I
mean? But I think it makes a lot of sense,
as you say, AWS does that and they need the
GPUs. Probably even today they need it the most when
they're doing the big training runs. But then inference is
just taking over, over. And then as we start to
look at what's happening in inference, right, the chips are
kind of getting much, much smaller. They're specialized inference chips
now. So you're not even using the Blackwell's H1 hundreds,
arguably for inference, there's a lot of providers, if you
think of things like Grok, for example, they're using specialized
chips in that sense. So you're not even passing it
off over there. So yeah, I mean, what are you
going to do with that? And so even if you've
got the latest Blackwell or whatever the next version after
that is going to be, then when you're not doing
the big training runs, then you've got spare capacity that
you want to be able to sell off. And now
that becomes great for us because if you want to
know when to train in the next model, just have
a look at the spot models and if there's none
available, you know what's happening. Yeah, that's a good tip
for the future. Aaron, I guess question for you is
like, can OpenAI win in this space? Like tech are
offering these kinds of services, they're going to be going
up against some pretty big players. And I guess the
kind of question is like, do you have confidence that
OpenAI can just kind of flip a business like this
on? Well, I guess my first thought when I was
looking into this is today it seems as though OpenAI
is very deeply dependent on Azure, right. And so for
compute and even distribution of their models. But it seems
as though in the long run OpenAI, they seem to
be exploring building their own infrastructure which would then almost
like rebalance, you know, this, this from a dependency that
they have to a collaboration, maybe. A collaboration. Right. And,
and so that, that I think strategically is what potentially
could be happening here, right? Yeah. And, and is it
a good thing? And, and can they do this? Well,
I think so. I would just be careful because I
know that they've also, you know, released, you know, or
said that they want to have, you know, this consulting,
you know, area, you know, where they're going to charge,
what is it like $10 million per client, right, to
help them, you know, user models. So I mean, that's
a big area, focus area that they have to work
that they have, you know, and they don't want to
fragment themselves too much to get away from their bread
and butter, you know, but if they can be successful
and put it all together, then I think OpenAI can
pull it off and it'd be a nice reorganization of
the current business landsc landscape. Yeah, I think it's a
good point. It's just like how much like, you know,
every few months it feels like OpenAI is launching a
new product line, maybe that's actually creating a bunch of
spread. Lauren, was that you want to jump in? I
was going to say, I think in terms of whether
they can win, one question is could they win against
other companies doing this? The other question is could they
win against open source? So VLLM is very popular. Tensorrt
LLM is also quite popular and those are really the
core technologies you would need to set up your own
deployment rather than just use a hosted API. And I
think there'd have to be a really strong case around
inference optimization, other innovations are very quickly showing up in
those engines because it has a whole community's worth of
contribution happening in like almost real time. Yeah. That's really
interesting. We've talked about obviously the pressure that OpenAI has
had on like the, the model side from open source.
You're almost saying that this actually goes a level deeper.
Right. Is like can it produce an inference stack and
infrastructure business that's competitive with what's happening in open source?
I hadn't really even thought about that. It's really interesting.
Yeah. Yeah. I kind of wonder if this is a
hedge, you know, because they just released their open weight
models. Right. And, and so because they're sort of doing
some of that work, if they can build out this
specialized infrastructure that's better than anyone else, then perhaps this
is where they think the market is going so that
they can still remain financially solvent. Right. Yeah. Even as
the bottle price comes down, you're trying to capture it
on the infrastructure side. It's really interesting. Yeah. What I
didn't see was their financials about how they were going
to fund this trillion dollar investment to build their own
data center. So I'd be interested to see some of
that whenever it comes out. I'm going to move us
on to our next topic today. It was very funny.
We had prepared this segment all to focus on the
ins and outs of a very detailed economic study that
came out about jobs and AI and we will cover
it on a future episode. But as so happens so
frequently in the world of AI, Nano Banana launched and
that obviously has taken up a lot more airtime in
AI world and I think it's worth going into. So
we're going to instead, rather than talking about AI economics
and the labor market, we're going to talk about Nano
Banana. Chris, I think you are one of the strongest
advocates for switching out topics so we could talk about
Nano Banana. I think the question for you is how
big of a deal is this? It seems in some
ways that it's kind of just like a toy. Right.
Like you put an image in and swap a person
out and all that kind of stuff. Talk to us
a little bit about like what's going on beneath the
hood and whether or not this is significant from a
kind of research and technological capability standpoint. Okay. So I
think the first thing to say is I think this
is way more than a toy. This is by far
the best image generation model that I've seen today. So.
And even if we look at the benchmarks, you know,
when we look at, and I'm not a big fan
of benchmarks, as you know, but even when you look
at those benchmarks, it is 200 sort of Elo points
ahead of everything else. Right. So it is absolutely just
killing it. And what is super cool about it is,
to your point, Tim, is the quality from the model
is great. The text capabilities of the model is great.
So if you typically look at an image model, it'll
mess up the text and all that side of things,
and it doesn't look great. The quality is just amazing.
And then to your point, it's like the ability to
hold an image and then put that image in different
spaces, maintain the physics, et cetera, is absolutely brilliant. So
to your point, you can face swap, you can add
a smile, you can make a change, you can put
somebody in a different location. It all works great. In
fact, if I want, if we can. Can I share
my screen? Tim, Can I share my screen? I believe,
yeah, the permissions are open if you want to share
your screen. So for all of the wonderful users here,
whilst I was supposed to be paying attention to the
podcast, this is. What Chris is usually doing when he's
listening to everybody else talk. Yeah, exactly. This is what
I built instead. Women. Let's. Let's see. So we'll, we'll,
we'll, we'll see here. Let's get rid of this. So
I said, put Tim in a banana seat. There's Tim.
Screenshotted from today's podcast. And then there's Tim, right? And
he didn't look very happy. Oh, Sam in a banana
suit. Make Tim happy in a banana suit. And there
he's got a nice happy face. And I said, he's
gonna be happy because he's in banana. He's in my
Banana Beach. Miami Beach. Banana Beach. Very nice place. And
then I said, oh, no, he needs a friend with
him. And an apple suit. Sorry, Lauren, I didn't get
your permission. There we go. And Tim and Lauren are
in Miami beach, happy. So, I mean, we're joking around.
I'll stop sharing my screen now. We're joking around. But
the reality is that is fantastic. Getting any of the
other models to be able to do to that quality,
and it does all the Style transferences as you imagine
now, if you start to think of the impact of
that, everything from creation of YouTube thumbnails to image editing,
filtering, all the sort of things that you would have
typically done with kind of Photoshop type stuff, then you
know, think of things like Canva for example, you would
typically. I mean I use Canva a lot. What's going
to happen there? Right, because you're going to start to
be able to just use this, you know, out of
the box from Google AI Studio. I honestly think it
is phenomenal and I really think there's going to be
a lot of people who've invested in image models really
sort of starting to panic very quickly. And I do
want to pick up on that point. You know Aaron,
I think one of the narratives that has kind of
played out in a very interesting way over the last
year or so has been, I think had you asked
me January 2025, you would have been like, who's leading
in this AI space? Been like, Ah, you know, OpenAI
anthropic. And then Google is like kind of at the
very end of the list like oh man, they just
do not have their act together but kind of announcement
by announcement they really seem to be catching up in
a really pretty significant way. And so I guess Aaron,
I just wanted to kind of for you to reflect
a little bit on, you know, do you think that
this is like in some ways like this is kind
of like Google really kind of like fighting for first
now in some respects, particularly on the image side? Well,
I never thought I'd see Lauren in an Apple suit,
you know, that's for sure. Right. And I think that's
pretty impressive. So I mean as far as what is,
you know, sort of this jump forward, I really like
this multi turn editing capability where it can remember and
build upon prior instructions that you already gave it, you
know, and that's an indicator of some kind of extended
attention and memory capabilities within the model, you know, which
sort of propels it up, right, the projection of some
of the best image generation models. And I think some
of the other pieces are this, you know, it has
like up to 1 million tokens, right, that, that you
can put in because you have to be able to
put in like a text prompt and then also add
in an adding an image. Right. And so all those
things and then also seeing Tim in a banana suit,
I think that definitely propels it up to the number
one image generation system out there. Lauren, I guess we
can use the opportunity, I think to have you on
the show because I think you've already brought it up
a couple times as the kind of ever present influence
of Open Source on this space and it's certainly for
language models and text. It feels like Open Source is
kind of, you know, kind of like in the running
for state of the art where you feel on kind
of like open Source image and kind of other forms
of media generation. Is that similar in the space from
your perspective where like Open Source is really kind of
catching up very quickly or is a place where, you
know, the space is still lagging? Yeah, I think on
the models front it's maybe not as important as on
the inference engine and then even like the user interface
front because you need all of those pieces to come
together and the inference engines are typically more skewed towards
text use cases. So I think even if the models
are up to par, it's not the same as being
able to go to that user interface that Chris just
showed, which was probably free, or at least a free
tier. There's not really the equivalent in open source. There's
always going to be that element of, you know, DIY
ness that you have to do to first find the
model and then the model might not be as generalizable.
There might be, you know, there are certainly if you
look at hugging face, I think it's millions of models
at this point. So you could find models that are
good at specific tasks. Not so sure about as generalizable
as what we just saw. Chris. The final question on
this is, you know, it's the inevitable question, but we've
been freaking out all the time for years at this
point on how AI generated images are going to destroy
our ability to know what's real and what's not. Have
we finally crossed the threshold with nanobanana? This is like
pretty good. Well, it's real anyway, Tim. We're all living
in a simulation, so it's fine now. I actually think
the progression here is really good. Right. So I actually
think the fact that we've been seeing terrible image models
for a while has been a very good thing and
we've all got pretty good at spotting. I like the
hands slightly off, you know? Yeah. So I think over
the last few years we've kind of got used to
it and we now know not to trust images, do
you know what I mean? So I think if you
think of all of the kind of the flux stuff
with Black Forest last year, that was perfect example. We
saw our politicians holding hands, doing whatever. Right. We've got
used to it. We know not to trust these models
and the outputs. I think the bigger thing in this
case is making sure that we hold people accountable for
the models that they can create and making sure that
the safety elements of those models are high. Because there's
a good side. People like me who couldn't create a
thumbnail, great, I'm now going to be able to create
decent thumbnails for my YouTube channel plug. But for others,
it means they're going to lose business in that sense.
And then there's a whole lot of scary scenarios there.
So I think there is still a lot of the
ethical side that needs to be worked out. But the
quality is great and it's just going to get better.
Right. And actually, one of the things I would say
is we're seeing this right now for image. We can
guarantee if we project forward 12 to 18 months, you're
going to see the same level of quality on video,
you're going to see the same level of quality on
audio as well. So this is just going to extend
out across modalities. And I'll add too, that I think
this being such an editing focused model, editing has kind
of become a bad word because it, you know, editing
means manipulation, means like malicious intent. But there are really
important use cases for editing. So like with the geospatial
models that we built with NASA, one of the biggest
struggles is cloud cover. You know that most of the
images, satellite imagery, has cloud cover, so you can't do
anything with that. And if you could do actually synthetically
generated data using an editing model to then improve your
dataset, to train a foundation model, that's an editing use
case. And it's, you know, it's not about manipulation or
changing the meaning of something from a human perspective. It's
more for a machine learning perspective. That's right, sure. And
just being able to see Tim smile as well is
really important. As you know, I never smile on these
shows, so. All right, last topic of the day, Aaron.
It's always a joke that when we bring you onto
the show, we're going to talk about sports. And I'm
not going to let us break the tradition here on
episode 70. So you've been covering the US Open, and
I think the team's been doing some interesting experiments. And
so we've been doing a lot of screen sharing on
this episode. I believe you want to kind of share
some of the stuff that you've been doing as well.
Yeah. So I mean, I mean, first, you know, you
know, if I could just give a prelude as far
as, you know, what we're doing. So I mean we've
been with the US Open for over 30 years and
there's about a million fans that show up and attend
the Flushing Meadows site. And then every single day about
there's another 14 million fans that tune in through our
digital properties. And what we've done is we've been the
hallmarks of the US Open is where we want to
combine the fan experience with technology so that we can
bring in people, expand the swath of what we're doing.
And we've introduced three new features this year. So one
of them is a match chat. So we spent several
months building this very impressive system and we'll have a
few papers out that describe the science behind it. But
what this is, it's a real time sort of agent
driven assistant so that you can go in and ask
a question about a match, about players in real time,
large scale and get a response back. And then the
second piece is called Key Points. So we always say
too long, didn't read tldr, right. You see that a
lot. But there's these very long articles that people just
don't have the time to read. And so we summarize
it and then we show those bullet points on top
of these articles and we have a workflow of which
we work with USTA editors. And then the third one
is called Live Likelihood to Win. This has a very
long historical background, but we combine predictive modeling so we
have an ensemble of different predictive modeling that then go
into who's going to win, right. The match, we have
a pre match prediction and then as the match goes
on we have some proprietary equations that we develop that
then fine tune and change the odds that somebody's going
to win given these momentum. But ultimately what we want
to do is increase the breadth and depth of fans
and give them the information that they need so that
they can understand the story of a match. And what
I was hoping to do was continue this trend of
experimentation of screen share and just show you some of
the work here that we've done. It's live right now
and play starts pretty soon. So it's 10:47 right now,
so it starts at around 11:00 clock and we can
go ahead and see some of the action. So just
to orient you, this is our work that we put
right, that's a twin, right. Of this. And I want
to just quickly show you when A user comes in.
What's one of the first things they want to know?
Well, they want to know the scores of a match.
And so I would like to highlight two matches. One
of them was a big upset. So her name is
Ila and she's a 20 year old from the Philippines
and she beat. Right. Tucson. Right. So that, that's one.
Right. And then another is the Alcaraz match that I
also want to show you. And let's check out the
Alcaraz match here. So what you do is you. Because
this is already finished. Right? But imagine, you know, plays
going on and there's a match that's going on, which,
which you can check later in the day. But let's
check out the match recap. So we have, you know,
the IBM Slam tracker and it pops up and you
can quickly see on the sidecar that we have the
first tile would be the score here. And then when
you go down, we have this 360 degree storytelling of
the match. And if you want to know beforehand, if
the match hadn't started, what's the likelihood that Alcaraz is
going to win? Well, it's pretty high in this case.
This is a very early round. So this is what
round two. And Alcaraz is off to a strong start.
But this is what we've assigned that Alcaraz has an
82% chance of winning. And again, this uses pure predictive
modeling that we've experimented with over years and years. Now,
because the match is over, we can go to the
summary tab. Right. And you can see the live likelihood
to win. You know, how it's changed over time. Right.
And there weren't very many fluctuations in this one because
Alcaraz, you know, had a very big, you know, advantage
whenever he came in. Right. But now if I want
to know some details. So this is Matt's chat. This
And so let's just click it and it opens up
and we have a frictionless user experience that we've designed,
you know, you know, so that we can sort of
guide the user and to help them get the information
that matters the most. And we did a lot of
user studying, we did a lot of data analytics to
figure out what do people care about. So let's check
out. I find match stats very interesting. So let's just
ask a question. Let's say how many aces did. And
let's put a player that isn't even in this match,
that's center half. And so let's do this first. So
it's thinking, it's going through and it's hitting the pieces.
And so what it first says is, wait a minute,
do you want to know set by set, or do
you want to know about the match? And let's hit.
No, Right, So because I want to know about the
match here. And so now it's thinking again, analyzing, and
this is going out right, real time. Right now it's
hitting our middleware going out into aws. And what it
does is it then in turn comes back and tells
us how many, you know, servers did Alcaraz have. Right?
And it worked well because it was able to switch
right center right into the right players that it, that
it has. So it automatically does a lot of the
detection. So we have a lot of hat pipelines, and
then it does pronoun corrections, it does player corrections and
so on. Right. But you can play with this more
as you go through and see what all we've built.
But it's very interesting and there's a lot of deep
statistics that come in. And so if we were to
keep going, then you can see lots of stats that
people really want to know about. But in the interest
of time, let's just go back and close this. Right?
And you know, why don't you. You pick a match
here on the screen rather than me picking a match.
Tim, let's do the Harris versus Fritz there on the
bottom. Right? On the bottom. Right. So, okay, this one
here. All right. So, you know, here, the pre likelihood
to win. Let's check that out. I mean, Fritz was
overwhelmingly the favorite, right. And so because of that, whenever
you go and look at the live likelihood to win,
if we trace it with the actual match, you can
see that Fritz lost the first set. So his odds
of winning it goes down, but not that much because
he's still favored so heavily. And then the storytelling keeps
going on where it's a very close one. It gives
him the break points in the second set. And because
Fritz wins well, I think he's regaining the momentum. And
then the match continues, right. We're finally set for, you
know, he. He eventually takes that over. So this live
likelihood to win is really powerful during the match. It's
itself because you can track and trace how that works.
You know, so that's. That. That in essence is. Is
what I really wanted to show, you know, some of
the exciting work. Right. That's live right now. And then
a plugin, you know, for ESPN Fantasy Football. We went
live with a few other pieces yesterday and then next
week on Wednesday, we're going to have another piece that's
live. But if you're part of a fantasy football team,
go and check out our player insights and factors that
we have and grades and so on. That's great, Aaron.
Awesome. Well, we'll keep you posted. And for all you
listeners, we'll keep you posted. And I guess, Aaron, as
this continues to develop out, we'll have you back. I
think it's fun having you on the show regularly because
it feels like we get to see the iteration every
time you come back on. And so it's cool seeing
that happening. Cool. Yeah. Awesome. Well, that's all the time
that we have for today, so thanks for joining us,
Aaron, Lauren, Chris, it was a pleasure always to have
you on the show. And thanks to all you listeners.
If you enjoyed what you heard, you can get us
on Apple Podcasts, Spotify and podcast platforms everywhere. And we
will get you next week on Mixture of Experts.