90% of Enterprise Data Unstructured
Key Points
- The panel humorously debated how much enterprise data is unstructured, with guesses ranging from 40% to a tongue‑in‑cheek 200%, before revealing that roughly 90% of enterprise data is actually unstructured.
- This episode marks the 50th installment of the “Mixture of Experts” podcast, featuring discussions on the upcoming Llama 4 release, highlights from Google Cloud Next, and recent Pew Research findings.
- IBM Fellow Hillery Hunter introduced the newly launched IBM z mainframe, emphasizing its “zero downtime” design that achieves eight‑nine (99.999999%) reliability, translating to only a few hundred milliseconds of downtime per year.
- Hillery explained that mainframes like IBM z underpin the global economy by handling the vast majority of financial transaction processing, making them critical yet invisible infrastructure for everyday banking and market activities.
Sections
- Estimating Enterprise Unstructured Data - Panelists humorously guess the share of unstructured data in enterprises, ultimately revealing it’s roughly 90%.
- AI-Powered Real-Time Transaction Fraud - The speaker explains how ultra‑fast, highly reliable AI is embedded into transaction systems to score billions of events per day, enabling instant fraud detection at the point of sale.
- Bringing AI Inference Close to Transactions - The speaker explains why large banks need to deploy fine‑tuned language models at the edge to achieve sub‑millisecond inference for mission‑critical tasks like fraud detection, avoiding cloud latency and security concerns.
- Enterprise AI Expands to Core Use Cases - The speaker celebrates recent breakthroughs in applying LLMs to enterprise workloads, overcoming latency hurdles, and previews upcoming Spyre AI enhancements for self‑healing, automated systems.
- Open‑Source Giant Models Accelerate - The speaker highlights the debut of huge open‑source language models—ranging from a 400‑billion‑parameter mixture‑of‑experts system to a 2‑trillion‑parameter model—arguing they pressure closed‑source labs and could broaden community support for mixture‑of‑experts architectures.
- Beyond the Race: Llama 4 Strategy - The speakers contend that the significance of Meta’s Llama 4 release lies not in a head‑to‑head leaderboard but in its illustration of shifting open‑source tactics and its massive industry influence, highlighted by over a billion model downloads.
- Evaluating Mega-Scale Open-Source Models - The speakers debate whether massive models like the unreleased “Behemoth” are practical and open‑source viable versus being merely marketing hype, highlighting community innovation, cost‑performance trade‑offs, and real‑world deployment challenges.
- Scaling Down: Leveraging Giant LLMs - The speaker predicts that upcoming massive Llama 4 models will mainly be used to generate and augment enterprise data, which can then be distilled into much smaller (1‑10 billion‑parameter) fine‑tuned models for laptop‑scale deployment.
- Google Cloud Next AI Breakthroughs - The speaker recaps Google Cloud Next’s showcase of rapid enterprise growth, AI integration through TPUs and Gemini, the new Ironwood chips, and the opening of Google’s massive fiber network to customers.
- Google’s Peer Agent‑to‑Agent Protocol - The speaker outlines Google’s new agent‑to‑agent standard that enables LLM agents to interact as equal peers, integrates with Anthropic’s MCP and IBM’s consulting workflow, and emphasizes Gemini 2.5 Pro’s benchmark leadership and focus on safety.
- Google’s Gemini 2.5 Advances - The speaker lauds the Gemini 2.5 Pro release, highlights Google’s unique ability to train models on vast B2C data for cinematic video generation, and expresses excitement about the model’s impact on the field.
- Personalized Video AI and Public Perception - The speaker envisions AI‑driven, celebrity‑filled personalized movies and cites a Pew Research report showing a sharp gap between experts who downplay AI’s job impact and the public who feel threatened by it.
- Upcoming Episode: Mixture Experts - The host signs off, promising that the next installment will cover the Mixture of Experts subject.
Full Transcript
# 90% of Enterprise Data Unstructured **Source:** [https://www.youtube.com/watch?v=90fUR1PQgt4](https://www.youtube.com/watch?v=90fUR1PQgt4) **Duration:** 00:37:20 ## Summary - The panel humorously debated how much enterprise data is unstructured, with guesses ranging from 40% to a tongue‑in‑cheek 200%, before revealing that roughly 90% of enterprise data is actually unstructured. - This episode marks the 50th installment of the “Mixture of Experts” podcast, featuring discussions on the upcoming Llama 4 release, highlights from Google Cloud Next, and recent Pew Research findings. - IBM Fellow Hillery Hunter introduced the newly launched IBM z mainframe, emphasizing its “zero downtime” design that achieves eight‑nine (99.999999%) reliability, translating to only a few hundred milliseconds of downtime per year. - Hillery explained that mainframes like IBM z underpin the global economy by handling the vast majority of financial transaction processing, making them critical yet invisible infrastructure for everyday banking and market activities. ## Sections - [00:00:00](https://www.youtube.com/watch?v=90fUR1PQgt4&t=0s) **Estimating Enterprise Unstructured Data** - Panelists humorously guess the share of unstructured data in enterprises, ultimately revealing it’s roughly 90%. - [00:03:02](https://www.youtube.com/watch?v=90fUR1PQgt4&t=182s) **AI-Powered Real-Time Transaction Fraud** - The speaker explains how ultra‑fast, highly reliable AI is embedded into transaction systems to score billions of events per day, enabling instant fraud detection at the point of sale. - [00:06:15](https://www.youtube.com/watch?v=90fUR1PQgt4&t=375s) **Bringing AI Inference Close to Transactions** - The speaker explains why large banks need to deploy fine‑tuned language models at the edge to achieve sub‑millisecond inference for mission‑critical tasks like fraud detection, avoiding cloud latency and security concerns. - [00:09:18](https://www.youtube.com/watch?v=90fUR1PQgt4&t=558s) **Enterprise AI Expands to Core Use Cases** - The speaker celebrates recent breakthroughs in applying LLMs to enterprise workloads, overcoming latency hurdles, and previews upcoming Spyre AI enhancements for self‑healing, automated systems. - [00:12:27](https://www.youtube.com/watch?v=90fUR1PQgt4&t=747s) **Open‑Source Giant Models Accelerate** - The speaker highlights the debut of huge open‑source language models—ranging from a 400‑billion‑parameter mixture‑of‑experts system to a 2‑trillion‑parameter model—arguing they pressure closed‑source labs and could broaden community support for mixture‑of‑experts architectures. - [00:15:32](https://www.youtube.com/watch?v=90fUR1PQgt4&t=932s) **Beyond the Race: Llama 4 Strategy** - The speakers contend that the significance of Meta’s Llama 4 release lies not in a head‑to‑head leaderboard but in its illustration of shifting open‑source tactics and its massive industry influence, highlighted by over a billion model downloads. - [00:18:40](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1120s) **Evaluating Mega-Scale Open-Source Models** - The speakers debate whether massive models like the unreleased “Behemoth” are practical and open‑source viable versus being merely marketing hype, highlighting community innovation, cost‑performance trade‑offs, and real‑world deployment challenges. - [00:21:44](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1304s) **Scaling Down: Leveraging Giant LLMs** - The speaker predicts that upcoming massive Llama 4 models will mainly be used to generate and augment enterprise data, which can then be distilled into much smaller (1‑10 billion‑parameter) fine‑tuned models for laptop‑scale deployment. - [00:24:59](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1499s) **Google Cloud Next AI Breakthroughs** - The speaker recaps Google Cloud Next’s showcase of rapid enterprise growth, AI integration through TPUs and Gemini, the new Ironwood chips, and the opening of Google’s massive fiber network to customers. - [00:28:00](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1680s) **Google’s Peer Agent‑to‑Agent Protocol** - The speaker outlines Google’s new agent‑to‑agent standard that enables LLM agents to interact as equal peers, integrates with Anthropic’s MCP and IBM’s consulting workflow, and emphasizes Gemini 2.5 Pro’s benchmark leadership and focus on safety. - [00:31:03](https://www.youtube.com/watch?v=90fUR1PQgt4&t=1863s) **Google’s Gemini 2.5 Advances** - The speaker lauds the Gemini 2.5 Pro release, highlights Google’s unique ability to train models on vast B2C data for cinematic video generation, and expresses excitement about the model’s impact on the field. - [00:34:08](https://www.youtube.com/watch?v=90fUR1PQgt4&t=2048s) **Personalized Video AI and Public Perception** - The speaker envisions AI‑driven, celebrity‑filled personalized movies and cites a Pew Research report showing a sharp gap between experts who downplay AI’s job impact and the public who feel threatened by it. - [00:37:14](https://www.youtube.com/watch?v=90fUR1PQgt4&t=2234s) **Upcoming Episode: Mixture Experts** - The host signs off, promising that the next installment will cover the Mixture of Experts subject. ## Full Transcript
What percentage of enterprise
data is unstructured data?
Kate Soule is Director of Technical
Product Management for Granite.
Uh, Kate, welcome back to the show.
What's your estimate?
This feels like a trap, uh, without,
you know, just a, a wild guess.
I'm gonna say 40%.
Shobhit Varshney is Head of
Data and AI for the Americas.
Uh, Shobhit tuning in live from Vegas.
Uh, what do you think?
200%. Have you seen the quality
of structured data in companies?
All right, great.
And last but not least in joining us for
the very first time is Hillery Hunter, IBM
fellow, and CTO of IBM Infrastructure uh,
you've got an advantage on this question, but
I don't know if you wanna offer your guess.
Yeah, I'll, I'll take the midpoint there.
Uh, not exactly the midpoint,
but uh, I'll go with 80%.
Okay, great.
So the answer is 90%.
Uh, we're gonna talk about that
today and all that, and more on the.
Very 50th episode of Mixture
of Experts 50th episode.
Crazy and welcome for Woo-hoo.
Yeah. Woo-hoo.
I'm Tim Hwang and welcome to Mixture of Experts.
Each week, MoE brings together a talented
and just lovely group of researchers, product
leaders, and more to discuss and debate the
week's top headlines in artificial intelligence.
As always, there's a ton to cover.
We're gonna talk about the Llama
4 release, Shobhit's in Vegas.
He's gonna tell us all about Google Cloud Next.
Some really super interesting
research coming outta Pew Research.
Uh, but today, uh, we want to take the
opportunity because Hillery is on the line with
us, uh, to talk about IBM z, which is a new
launch that just came out on I believe Tuesday.
Um, and it concerns mainframes.
Uh, and so I guess, Hillery do you wanna
just start for listeners who are less
familiar with the sector, what is a
mainframe anyways and why is it important?
Yeah, I'm, I think first fun fact is "z"
stands for zero downtime and mathematically,
that's kind of an interesting conversation.
We talk about the system now having eight
nines of reliability and the way that
you, you count those nines, as you say,
it's 99 point and then six more nines.
So that's how you get to, it's a lot of nines.
Nines resiliency.
Yeah.
But it means just a couple hundred
milliseconds, a year of downtime on average.
And so, you know, when I talk to family
members or I, I meet someone socially.
I kind of say we work on building the computers
that you don't see and that you just sort
of assume are there and never think about.
And what that means is really, this
is where most of the world's financial
transaction volume, everything from things
in the market to your personal credit card
transactions go through it in the back end.
And you hopefully never think about whether
or not that computer's gonna work or your
credit card transaction goes through.
These are these systems that we
just all assume are up all the time.
And so it's really kind of at the core of.
The global economy, to be honest,
that's really not an exaggeration.
Yeah. What I love about this is like you work
on like arguably the hi, some of the
highest of highest stakes computing.
Um, and I think one of the most interesting
things about the launch is that I know AI is
a, a, a big part of this launch in some ways.
Um, I know there's sort of z17 which is
the mainframe, and then there's "z" sort
of the software, which sounds like kind
of IBM pushing into the idea that these,
you know, 6 9s of reliability computers.
Really are gonna get, you know, sort
of integrated into the overall sort
of AI revolution, which, you know,
we talked about on the show before.
AI, you know, is, is is not
always kind of like production.
It, it like sometimes, you know, messes up,
it's stochastic, it has all sorts of randomness.
So curious to hear a little bit more about
like what's getting launched on the software
side and I guess how you kind of like get
AI to work at like such a high level of
reliability that I think most software
developers never even need to think about
as they're kind of vibe coding or whatever.
Yeah, it's, it's a pretty different space, but
it's equally fascinating, I think to that whole
vibe coding kind of space that a lot of folks
are interacting with now on a daily basis.
Um, from a technical perspective, getting things
done in transaction means having millisecond
level AI and that means super, super fast,
tightly integrated, being able to handle billions of transactions a day, um, and
being able to score things at line speed, right?
So.
Again, anecdotal sort of example,
if you're talking about fraud
and analytics in the credit card
transaction processing space.
If I as a consumer am buying
something online, it's okay.
There's minutes to hours before the thing gets
shipped out, you know, so fraud can happen
offline, but if it's in a store and somebody's
trying to rip you off and buy an expensive
phone or something like that, at Best Buy,
you wanna make sure that instantaneously,
the moment the transaction goes through,
that it's detected as being fraudulent.
And so there's actual real economic
value and consumer value to being able
to score every transaction in real time.
The interesting thing that we're now
talking about being possible on this next
generation of mainframe is multi-model AI.
So a really small, fast compact model that's
running there, right on the processor, dealing
with this massive transaction throughput.
Maybe occasionally it has low confidence in
the scoring it provided, and it needs to be
backed up by a bit more robust, complicated
model, and so we're putting extra AI cards
called the Spyre card into the system to
enhance not just being able to do that super
fast processing on the processor itself.
But also do fast processing, one
step slightly removed and adjacent
on a PCIe attached set of cards.
And so we've just multiplied the AI
capacity, um, and throughput for the system.
And also then from the perspective
of then the total system experience
on the software side, like you said.
We now something called Operations Unite,
which is AIOps driven AI chat driven interface
to everything going on in the system.
So observing, remediating issues, all
happening in a totally modern interface.
So it's pervasive once you
put the AI capability in.
It's not just about the workloads running in
the system, but also how people use and operate
and keep the whole thing stable and healthy.
Yeah, that's awesome.
So Shobhit I'd love to bring you in.
I, I know I launched this episode
with a question about just.
How much unstructured data, uh, enterprises
are sitting on, and I'm sure this is a problem
that you have to deal with and that you
talk about with customers day in, day out.
Uh, I know that's a component of this launch,
but curious if you want to just opine a
little bit on kind of how the world is
evolving there and I guess how the Z launch
sort of fits into some of those questions.
Uh,
a big fan of, uh, of, of the Z Series and I
grew up in a cloud, first AI, first world, and
I have so much respect for understanding the
right balance between where mainframe should
be playing versus where the clouds are, right?
So as an example, working with a very,
very large bank where we leveraging cloud
environments with a lot of different GPUs
and compute behind it to train the models.
But once you have fine tuned the models
to enterprise data, you wanna go bring
it where the transactions are happening.
And these are sub milliseconds, right?
Very, very quickly.
You're having doing this, and you're doing
billions and billions of these every hour.
So you want to bring the AI
inference as close as possible to
where the transaction is happening.
In the first wave of doing unstructured
content analysis, you would have some
large language model that summarizes
a call recording or starts to do some
knowledge search and things of that nature.
Now, in the next wave, once we've proven out
that this technology is working, you wanna
do this in more mission critical workflows.
For example, when fraud detection happens,
like Hillery was mentioning, there's a lot
of, uh, patterns that we need to look for.
It's not just that one
transaction that happened.
You also need to look for
how that transaction was.
Was, uh.
At that point of the transaction
happening in sub milliseconds, the
larger models have a lot of latency.
You can obviously not afford to have
that, that data go out to the cloud
and come back A, security issues and B,
the latency and, and, and other things.
Right?
So you, you, we are in the world where we see.
A lot of our larger fortune hundred
companies move from experimenting with
large, uh, frontier models that are API
calls to then fine tuning smaller open
models and bring them close to the compute.
So I think the Z series works
incredibly in this space.
And we also have the brand permission
with Z. They're like, what, Hillery What?
90% of all credit card transactions
happen on Z and 90% of the Fortune
50 banks rely on us and whatnot.
Airlines, retailers.
So you're on the mission critical workflows.
This is no longer, Hey, let me ask
the prompt a different way, right?
So you're not experimenting, you are
doing this in, in more critical workflows.
You know, I love that you went to latency.
I think one of the things related as well
to that whole leaving the system is the
data security model, data sovereignty,
all those other really hot topics.
And so I think also bringing AI to where
that data is and where that mission
critical data is, where that valuable and
sensitive consumer and personal information
is, is a big part of this conversation.
I, I think one other thing, in
addition, again to latency and then
that data protection is also the energy.
So we've greatly increased the AI capability and
the overall capability of the system, but drop
this whole system generation to generation by
17% in the power consumption, and the team has
measured that it's about five x more efficient
to do that AI in place where the data is.
Then, to your point, calling
out to some external system.
So these days everybody's running outta
power, looking to take out more data
centers base, all that other kind of stuff,
and being able to do AI so efficiently,
I think is a, is a really exciting step
forward.
And Hillery just, just about a month back,
I was with one of the largest top three
credit card companies and we were having
this, uh, concern around fraud detection
and said, uh, we can obviously do a lot
of LLM work to understand patterns, right?
It's not just a spot in time.
And even a month back, we struggled
to bring models that are LLM models.
In real time transactions 'cause it
just sub, sub millisecond and stuff.
And I was just so proud that in the
last week we, this week we've been able
to go after those use cases that we
couldn't even, even a few weeks back.
Right.
So we are coming to a point where clients
understand that they've proven it out
inside of their enterprises that we
can use LLMs and we've trained them
in a particular way, but latency was
coming in the way of us doing this work.
A lot of our clients are just huge
kudo to your team to doing this right.
I think you bring enough AI and, and to
your point, the creativity just explodes.
Every developer in kind of this core of the
enterprise space is now, oh, that's now for me.
That's not something for people
elsewhere in different environments.
It's now insurance claims processing,
even medical image assessment.
There's all kinds of amazing
things going on on that core data.
'cause AI is also for those people and
for that data and for that context also.
That's super exciting.
So Hillery, before we move on to the next
topic, what, uh, what comes next for you all?
Yeah, so the capabilities with
Spyre come out in 4th quarter.
There's rolling set of announcements on the
different software enhancements, and I think
the way to think about it is we're making these
systems AI through and through, like I kind
of mentioned, you know, starting back even
in z/OS 3.1, the last release there was AI
inside things starting to look in that direction
of self-healing or, or sort of automation of
management, of the efficiency of the system.
Uh, what we've stated about z/OS 3.2, which is
gonna be coming out is, is even more integration
of that smartness into the core of and the
heart of how the system operates, and then how
operations teams experience it and going all
the way out even into our support staffing.
So.
If you call IBM for help with something, now we
are also using watsonx technology to help those
agents who are helping you with your mainframe.
So that's a project that we started with
in our technology lifecycle services
organization with our storage products.
And we're, you know, we've announced
now this week that we're also
bringing that to mainframe support.
So that whole experience end to end,
how the system runs, what you can do on
it, what you understand about it, and
then how somebody helps support you is.
All gonna be AI enabled.
And I think that end-to-end in full
stack story is, is just really exciting.
This is us living what we've been
talking about with the power of AI.
This is awesome.
Yeah. So we'd love to have you back on
the show as things unfold here.
I think it's a, like a segment of AI that
we haven't talked as much about, but I,
I love it personally just 'cause it is
like this kind of very high stakes thing.
You really gotta get it right in these domains.
And so, um, you know, it's a kind of
AI, almost engineering that you don't
really see in a whole lot of other.
Which is really exciting.
So I'm gonna move us on to our next topic.
Uh, Meta has released LLlama 4 a long
awaited release in the open source space.
Um, there's three models
that they've talked about.
Two of them actually announced,
uh, the Scout model, the Maverick
model, and the Behemoth model.
Um, and it follows in a pattern that we've seen.
Elsewhere in the open source space
where people are launching both smaller
models and bigger models to meet a
variety of different applications.
Um, Kate, maybe I'll start with you.
I don't know if you had a chance to kind of
play with some of the models yet, but curious
about your early impressions, your vibe,
check, uh, on, on how this release went.
Yeah.
Uh, you know, it's been a busy week, so I
haven't had a chance to to play with them
directly, but it's really exciting to, I've
been reading up on them, uh, certainly, and it's
really exciting to see what Meta put out there.
Uh, I mean, with the release of their
largest model, which is, uh, you know,
over 400 billion parameters, I believe,
mixture of experts and a hundred billion
parameters, I think is the scout.
Uh, they're really starting to
take on larger and larger tasks and
create, you know, some powerful models
out in the open source ecosystem.
I think with the, uh, announcement of
their B myth model, which is, you know.
2 trillion, uh, parameters.
Uh, I think what they said said
that's big, right? That's big.
That's, that's pretty big, Tim.
Um, so, you know, they're, they're
talking about, you've already on earlier
trained versions, checkpoints, uh, it's
cracking GPT-4 0.5 on tasks like science.
So they're clearly, you know, putting themselves
out there as a frontier model provider.
And doing that in the open, I
think is only gonna continue to
put more pressure on these closed.
Labs to release some of their
work out in the open as well, and
more broadly help the community.
So that, that's really interesting.
Um, I think there is a lot to be
said about the, uh, mixture of
experts architecture that's going on.
Uh, where we see, you know, obviously
DeepSeek made this famous, uh, when they
first released, uh, back in December or
so, uh, with, not first released, but
released a big update to their family.
Um, it's.
An architecture that's been used
more broadly even before that.
But I'm really hopeful that this release
will help get broader community support
behind mixture of expert architecture.
'cause there's just tons of, uh,
really interesting things about it.
Very training efficient, um,
inference efficient, particularly
if run at a, a low batch size.
So.
You only have to use the experts that, that
you need to call at inference time, which, you
know, if you're just running, you know, one or
two tasks, uh, can be run really efficiently.
You start to lose a little bit of that if you
have to run these at much larger batch sizes.
'cause you have to load all
your experts into memory.
So most people don't quite realize
that about mixture of experts.
But either way, really
excited to see just another
power horse model get released, uh,
in this case, two power horse models
get released out into the open.
Yeah, for sure.
And if you can go into that a little
bit more for some of our listeners.
I mean, I miss namesake of the show,
so I have to kind of fight for it,
but it's like, has mixture of experts
been a little bit uncool as of late?
Like, I guess, is this kind of you, it
sounds like what you're implying is sort
of like these models might like make
it like a focus of the community again
in a way that it hasn't in the past.
And I'm, I'm kind of curious about
how that, how that's developed.
Well, I mean even just with
the the Z system, right?
We're talking about the focus on inference
efficiency, running things quickly
at inference time, and a lot of what
requires that, or what enables that is the
community building open source software and
platforms to be able to host and run these
models as quickly and fast as possible.
And just because the most popular open
source models to date, including pre prior
generations of Llama, have been dense.
Architecture models, a lot of the existing
support for hosting and running these models,
running them locally, run, hosting them and
running them yourselves on platforms like VLM
are, you know, predominantly based on some
of those more popular dense architectures.
So there is going to need to be kind
of a, a groundswell movement of the
community continuing to build out support.
I think we've seen a lot of that already with
the release of Llama 4, and I'm just excited
to get more open source developers interested.
In mixture of experts as architecture
as a whole and continue to build out
toolings and, you know, ways that we
can work with these models more broadly.
Sure. But maybe I'll bring you in here a little bit.
You know, I think that there's coff in a
way this discussion goes, which I think is
like less interesting, where it's basically
like, okay, Meta did this release now,
like, who's ahead, you know, in this race?
But like, I think that's often like the
wrong way to think about it, particularly
as the space gets more and more complex.
Yeah. Like how should we read into this?
Sort of launch about what Meta strategy
is and how it's trying to kind of like
fill a, a niche in the market, right?
Because I think rather than thinking about like,
oh, DeepSeek is ahead, or Meta is ahead, I think
we should just kinda ask the question of just
like, how are the strategies sort of evolving?
Absolutely. Yeah.
I'm curious if you have some thoughts like
what you read into this launch, basically.
So let's just start by, by acknowledging
what a consequential, uh, impact
Llama has had on industry, the Llama
models have been as of like 18th of March,
they've been downloaded a billion times.
Sure. Let, let's just let that sink in A billion.
That's a lot times we've downloaded a
model and made different versions of
it adapted and this of that nature.
Right.
So a lot of enterprise that we work with, they
are, we are very focused on how do I adapt a
model to our enterprise specific domain, our
data, and the way we want the models to behave.
Right.
That adaptation comes only when
you're really, really open.
There are certain, uh, frontier models
that can be adapted fine tuning, but
then you're leaving, you're sending
your proprietary data to the cloud.
That's a no go.
So usually open models, open weight
models are fine, uh, in that space
where you can go and tune them to that.
So our own Granite models, there's
some models from Mistral and
DeepSeek and others are also
open weights, open models.
But it takes quite a bit to create a good
mechanism to assess the quality of an output.
So for a lot of our clients, we have to go and
gr build end-to-end LLM benchmarking mechanisms.
How do you evaluate the output
on your specific documents?
So the.
Benchmark results that are public.
Those are a good starting point to
get you a directional y check to
say, yeah, it's worth looking at.
'cause Llama 4 did X better, but
none of my clients jump up and down
saying that, oh my God, this is like
0.2 points higher than the other one.
Right?
People have other criteria that we use to
judge which LLM uh, we should be leveraging.
It starts by IP.
Who can own the IP on that model.
It starts with where the data gravity
AI model follows the data gravity.
It's actually commitments that you've made
to specific vendor cloud vendors, right?
There is things around can I adapt this
to my own, uh, to my own environment?
And then return on investment, the
overall ROI of running these models.
So you'll see a trend towards every six months.
The next size smaller model gets
smart enough to outcompete the
previous one from six months back.
So we're seeing this constant trend
where we're getting really good
power, like the performance to.
The cost ratio, right?
I think that's the sweet spot, and
Llama has done a really good job.
I would anticipate that we'll continue
this trajectory of a billion downloads
and we'll have different adapted versions
of Llama available for our enterprises.
That's the right frame to look at it
versus, oh my God, this just crushed
the numbers on this particular task.
Then there is, uh, then there are
other models that will constantly
innovate with new methodologies.
I think DeepSeek did a phenomenal job with,
with some of the paperwork, our Granite models.
We have some really nice tricks up
our sleeves in our own models, and
we give back to the community too.
So I'm just super pumped about
the community coming together.
Open source, getting to a point you can adapt
it to the enterprise and very, very focused
on intelligence, uh, divided by the price
and the what, and that kind of a metric.
Hillery, maybe I'll bring you in,
um, you know, just to talk a little
bit about this Behemoth model.
I know it wasn't released, but
it is like shockingly large.
Um, and, and it's cool on one
level, you're like, wow, okay.
It's like really, it's really big.
I'm kind of curious though, like from your
point of view, you know, the degree to
which like these are actually kind of like.
Practical models that a lot of people will
use in the wild, 'cause it sort of feels
like the kind of infra you need to pull off.
Like really actually serving
and using a model of the scale.
Like there's part of me is like, is this
just kind of a more of a marketing thing
than it is actually like a practical reality.
But curious about your take on, on this
is like, is there room for open source
on the like mega, mega, mega scale model?
Just because it kind of almost like
limits like the set of people who would
actually practically end up using it.
Yeah.
I guess a, I have a lot of similar
thoughts to what Shobhit just shared.
Um, a couple of things, right?
I mean, within IBM Infrastructure, we're also
handling, creating the cloud infrastructure
for watsonx and deployment of all these
infra services and stuff like that.
So the other part of my brain is, is looking
at how do we bring, you know, more and
more powerful accelerators of all kinds
into that cloud environment to do whatever
it is that watsonx needs to do, right?
So if our customers are gonna
need those really big models.
I'm not gonna be the one that says No, we
won't provide the infrastructure for it.
Right? So we're advancing with NVIDIA and Intel
and a MD and putting, you know, new and
more GPUs out there to enable people
to play around with models as large
as they feel like are gonna be useful.
I think on the practical side though, we
see a lot of experimentation or attempts to use
these things maybe from a teaching perspective.
Um, but then when it comes to scaling out
deployments, almost all of our customers
then start to engage with us on how
can I customize smaller things, right?
So I feel like you sort of have to
know where things are at on the large
side and what it might do for you.
You may use that to inform
yourself on, you know, what the solution
might look like or, uh, maybe create,
um, you know, additional tuning data or
something like that, you know, to get that
characteristic that you need out of something
that's then gonna be affordable to scale.
So I continue like show bsu most
of our customers saying, Hey.
Um, you know, work largely
in kind of the B2B space.
As, as, as IBM we're working with other
large enterprises who have millions
to hundreds of millions of clients.
And when you're wanting to engage with all
of them and run at business scale of billions
and hundreds of millions of things and
people, um, the affordability very quickly
kind of kicks in and people, you know, start
looking at customization of smaller things
for real scale out of, of deployments.
Well, and if I can make a
prediction based off of what
Hillery, you just said.
Um, and, and kind of speaking to Shobhit, what
you mentioned about, you know, small LLMs are
increasingly being able to do more things.
You know, I, my prediction is that
most of the models for, uh, Llama
4 that were released, they're very,
even the smallest one is quite big.
You know, a hundred billion parameters.
I think they're going to be used most
by the community to fine tune some
of the older, smaller Llama 3 models.
So if we look at what can run on a laptop,
what you can easily train and customize,
you're really talking, you know, like, uh,
one to 10 billion parameters in size, uh,
more and you know, maybe a dense architecture.
'cause there's a lot of tuning support for that
kind of capability, uh, model already created.
So.
I think that some of the most immediate uses of
these biggest models are going to be to continue
on that trend of how do we get those smaller
models even more performant, uh, by using those
bigger models to be able to teach, to be able
to generate data, to be able to help augment
existing enterprise data and create more of
it, and them bring that and pack that down into
smaller models like the older generations of
Llama our generation of Granite, um,
all playing in that, you know, single
digit billion parameter size frame.
I, I, I totally agree, Kate.
And I think one other, you know, little
factoid, I'm sure you guys have talked
about this before, but it's estimated
that only about 1% of enterprise data
or 1% of the things in enterprise needs
and model to use are contained in
publicly available models, right?
So as you think about that, it has
to be that, um, an enterprise is
gonna be customizing something.
And then the question is what is that something?
And is that something
affordable enough then to scale?
Yeah. And uh, the size, uh, and both the
size of the model, but also the
context of Windows side, right?
10 million per context window.
What a world we live in, right?
I can just dump a bunch of data
to it and, and talk against it.
But it takes a lot to host these models.
So a lot of, uh, use, uh, different
vendors who are offering inference,
infrastructure, the same exact model
it is complex to host this and get it right.
Each vendor is offering different
kinds of context windows.
'cause not everybody can pull off a
10 million infrastructure the way you
fine tune it, so and so forth, right?
Even companies that do third party analysis, uh,
like artificial analysis and stuff like that.
It took them a few turns to get the models
to be provided, the inference infrastructure
just right to be able to match what Llama
had claimed to, to be the, the results
in their papers and stuff like that.
So it takes a few rounds to get this done,
and I believe that this is, speaks to the
complexity of some of these larger models
on how much difference you see from the same
prompt being sent to three different or seven
different vendors who are hosting this model
have slightly different responses and you see
quite a bit of a difference between the two.
So I think we'll get to a point where
derivatives of Llama 4, uh, the data that's
created synthetic data out of Llama 4 and
some of the new techniques that they released
will make their ways into smaller models.
And those are the ones that'll scale,
uh, across, uh, different companies.
But I'm generally very, very
excited of these, these big releases
that model companies are doing.
They're still sticking to their
open weight models, there's still the
restrictions that come with a Meta license
that's not quite Apache and MIT, but
overall our clients have, have, have loved
the fact that we can now outcompete each
other in the AI space and all clients win.
When you have great AI labs
working on this together.
I'm gonna move us on to our next topic,
which is Google Cloud Next, uh, show ba
you're actually dialing in, uh, straight
from Vegas, so I'll kick it over to you.
Um, you've been there all week.
Uh, what are the big things that we should
know about coming out of this, uh, this show?
It's, it's lovely to be with developers
and just people who are hacking through,
and clients who are actually using it.
Uh, 500.
Customer logos on screen.
That's where Google Cloud is today.
Like that's such a great testament to
where they were two, three years back
and they've done quite a bit to make
sure that they're serving the enterprises
and they have more and more data.
Cloud is growing, profitable,
things of that nature.
When you start to look at, uh,
how they're bringing AI across
the entire platform, how they are.
Exposing some of their internal strengths.
So as a, as a great example, they have
amazing TPUs to train their own models for
their own use cases like YouTube, so Gemini
across mobile apps and whatnot, right?
So they're, they're bringing that
TPU out to enterprises and they
constantly innovating on that.
So the latest release, Ironwood, amazing
progress they've made on their own chips.
Then there's a lot of stuff that Google does
in turn be to support their billions of users.
So things like their own wide
area network of, of fiber.
It's millions of miles of fiber that
they've now exposed to or to, uh,
enterprise, uh, users and stuff.
So this seem, they're seeming to make a
very concerted, uh, effort in making sure
that their secret sauce is now available
to the end enterprises to use as well.
Uh, overall, they, uh, they spent a lot
of time on media creation, uh, versus,
uh, use cases like coding or data and
things of that nature, the media creation.
Clearly they're the only cloud that
can do this end to end across all these
different modalities, creating content.
Uh, I was privileged to be part of the
sphere experience in the on Day Zero where
they showed us the Wizard of Awes and what
they're doing to do this on such a mega scale.
Right? It is just.
It's, it's a great experience to see
AI leveraging the, the best techniques
to go create a such a immersive
experience on this big sphere, uh, scope.
So a lot in the media space, but
not a lot of our enterprise clients
jump up and down on the media topic.
There's marketing great, there's some media
creation, but the bigger focus on enterprises
are what do I do with the call center?
What do I do in my code development processes?
My data is, is messy and things of that nature.
So they made.
Quite a bit of, uh, announcements in this space.
They have been for the last few weeks
announcing newer and newer models.
It's just amazing to see how 10 days
before your annual event, you're
releasing your Gemini 2.5, right?
This is, it's this great people hold
onto these big announcements, but in
this AI race, you can't wait for 10 days.
You need to get Gemini 2.5
out before Llama 4 comes in.
So it's, it's good to see that
progress is, is, uh, going really fast.
The performance per intelligence per dollar.
Gemini Flash has been doing really, really well.
Do talk.
Their Gemini 2.5 Pro model across the board
on the benchmarks and on all the different
things that matter, including the loss
exam for humanity is absolutely number one.
So a huge focus on that.
Uh, just shifting a little bit
more towards the agents space.
Uh, we had MCP from Anthropic, which
allows an LLM to in a structured way
with a, with a standard protocol, access
backend systems and stuff like that.
To compliment that Google has created
its own agent to agent protocol, which
allows one agent to talk to the other
agent, not as a tool, but as a, as a
equal citizen, like equal little citizen.
It's a peer.
So both of them can peer and they can talk to
each other and say, Hey, I found this error.
How do I what?
Do what?
Do you want me to do this?
Or maybe go talk to a human if needed.
And this is asynchronous.
It takes a while.
It can take long working task and
they can talk to them back and forth.
I'm generally very pumped when we
get to a point where people start.
Collecting around specific standards.
Uh, Google had a lot of different partners,
50 plus already working on, on, uh,
agent to agent within IBM consulting.
We obviously have a really
good agent tech workflow.
We have our own IBM Consulting Advantage
we already have MCP integrated into it.
Now we are working on agent to
agent within that space as well.
So we are getting really, really excited
about, uh, making sure that this is very
open ecosystem and you're working sideways.
Uh, those were my highlights
from the Google event.
Just very pumped about the.
The clients talking about the
specifics of how they did it.
It's not just a 30 second video,
but a whole half an hour session.
Let's deep dive, here are the
challenges, here's our journey of
which models we use and so and forth.
So it's very good to work with the product
teams and the customers in these events.
That's great.
Yeah.
So, uh, I guess Hillery an avalanche
of announcements here from a
number of different directions.
Um, I'm curious, I, I think as you kind of
like look at Google Cloud and what they're
announcing, trends, thoughts, hot takes, uh,
from, from the Sears, uh, Google Cloud Next.
Yeah. One of the things that caught my eye that
Shobhit didn't have on his list, so I can can
grab onto it and mention it, um, you missed one.
Yeah.
So, so they also talked about, uh, AI on
premises and, and offering those capabilities.
And I think that's also exciting to see
in the sense of, again, it affirms kind
of what we've been thinking here, that
clients do need to be able to run AI.
In an air gap environment, we keep saying that
AI is a platform conversation and that AI and
hybrid cloud are two sides of the same coin.
And really that's a statement
going back to everything.
We were talking at the beginning, that
there is data in really important places
and that data needs to be secured.
Sometimes it needs to adhere to sovereignty
concerns and other things like that.
And so.
Bringing AI to the data.
Um, and the fact that, you know, one of their
announcements this week affirm that is, is
something that they also see as important.
I think it is just a really good affirmation
of what we're also seeing in the enterprise
space that gotta bring the AI to the data.
AI is a, is a decision about how flexibly
you can deploy AI and all those locations
that you have, data and customers.
Um, it's not just a decision about.
Only which model and only
which location it runs in
any final takes.
Kate, I dunno if you have any thoughts
from, uh, this year's Google Cloud
Next uh, on any and all of this.
I mean, it's just like remarkable
every time like Shobhit comes to a show
and is like, here's what's happening.
And it just feels like, it's like this, like
voluminous list that I have trouble parsing, but
I know I need aho it to die to decompose.
Uh, yeah.
All of these main tech
conferences going on, it's great.
No, I mean, o obviously from, you know, my
perspective, I'm most interested in things
like the Gemini 2.5
Pro release, which has been really
impressive, honestly, getting great,
great vibe checks from that model.
Um, really exciting to see them
really kind of take center stage, uh,
and have a, a strong release there.
So, you know, more, more great
models out there only, uh, improves
what the field can can accomplish.
So, uh, from that perspective,
really excited to see them push the.
Push the boundaries.
Yeah, I think, uh, just
one last parting thought.
I think Google is really flexing.
Its, uh, B2C learnings, right?
The fact that they can train their models from
so much content, and again, I'm not getting
into where the content is coming from and
like, and indemnification and stuff of that.
I'm just purely commenting on the fact that
they can train on so much more real world.
Information from B2C, uh, space, right?
There's nobody else who has
access to so much data B2C, right?
So the video generation, for example, the videos
that they are creating are very, very cinematic
and they, it seems like they have really gone
out and looked at all of the YouTube videos
from really good creators and stuff like that.
So the quality is, is, is really good, and
it's translating into voice experience.
And this is becoming more and more critical for
clients to get to vendors, to get voice right.
And I think they have an unfair advantage
in the space where there they can go and
provide some very nice audio experiences
as you're, as you're thinking through.
So one small example was if I have some
Google docs and stuff like that, I can,
I can ask an agent to say, create, uh, a
particular workflow, do some research, and
then create a very long research paper.
So now it's created a three page.
Paper on a particular topic on why
your margins are dropping, though your
revenues are going up and it'll do
competitive analysis and all this stuff.
Create a three page paper.
I can click a button and create an
audio, uh, broad, uh, podcast out of it.
Right? And this like corporate enterprise stuff
that's so difficult to consume and now
you're plugging in a really nice audio,
uh, layer on top of it and I can listen
to it on my drive to to work, right?
I think the fact they have an unfair advantage
on the audio and the experience side.
That starts to give them some advantages
on the enterprise side as well that some of
the other, uh, peers of theirs don't have
with these podcasts going up on YouTube,
maybe Kate, you'll get the digital twin
of Shobhit that you've been wishing for.
Exactly.
As long as he gets some royalties from it.
Yeah, that's right.
Exactly.
There's ad dollars.
There. So, um, yeah, I mean, I think the future of like
educational entertainment here is really funny
and interesting to think about is like, convert
all my emails of the day into a Netflix series.
I can just watch when I get home.
You know? I think we will start to enter
this like very strange worlds.
But here's the kicker
man, and I'll, I'll absolutely close on this.
I wanna live in a world where
I can insert myself show bit inside of
a movie scene that I'm seeing, right?
If Iron Man comes to a bar and orders a
drink, I wanna be the bartender, right?
If, if you have, like, if you have all the
celebrities on screen, I wanna be part of that.
I could be the driver of like, I want
to immerse myself as part of the video,
and this was not possible till today.
So if you look at how far we have come with the
video creation, I think we're at a point we'll
have super personalized movies where they'll be
cracking jokes that I do on my daily basis too.
I'm gonna move us on to
our final topic of the day.
Uh, I'd be remiss to mention this even though
we just have a few minutes on the episode today.
Um, I really encourage you, if you're
listening to the show, to check out this super
interesting report that came outta Pew Research.
Essentially it's a, uh, survey of
American perceptions around AI and how
people use AI in their everyday lives.
Um, and I think we only have enough time to
kind of do a few kind of hot takes here, but I
think one sort of really interesting takeaway.
From this report was the degree to which
sort of experts in AI have views about AI
that are really, really divergent from, you
know, people who are just kind of like using
or experiencing AI in their like everyday
lives or even just having heard about
it and never used the technology at all.
Um, and I think maybe, Kate I'll kick it to you.
I think like one of the really interesting
results was, you know, all these data,
all, all these kind of data points about.
Experts saying, uh, that, you know, uh, jobs
won't be impacted by AI, but people really
feeling like jobs will be impacted by AI.
Um, experts generally being a lot more positive
on the technology than the journal public is.
Do you feel like this kind of impacts the
kind of prospects for AI going forwards?
Um, just kind of curious about your
quick take in the minutes that we have.
Yeah.
You know, I think there's a lot of
interesting things from the, the Pew report.
Definitely not enough to get
fully into right now, but.
I think it speaks to the optimism of the
researchers involved, which is great because
we need people optimistic about the impact of
technology and science on the world to be the
ones inventing and trying to push it forward.
But I also think it speaks a bit what I saw
to some of the representation in technology
in that we still have work to do to get
better representation to more reflect
the world building this technology.
So if you also look at, they broke down,
you know, men versus women's perception
of technology and how it will impact,
it's similarly men matched AI experts.
And it will be no surprise to anyone that
most of the AI experts and the research
field is still predominantly, you know,
uh, the work is being done by men.
So.
I think there's also, you know, it reflects
just some of the needed diversity and
different opinions and broader perspectives
that we still have room to grow and bring
into AI research as a discipline as a whole.
I think it's a great note to end on and
I think, um, hopefully it was a good sell
for you all to go check out the report.
I think there's a lot of data there
and I think it's worth really parsing
through and I, I agree with you.
I think it really points out, so the need for,
for greater efforts on diversity in the space.
As per usual.
I say this every episode now I feel
it's like almost a tradition like saying
agent, uh, in every single episode,
but we have had more things to cover
than we have had time to cover today.
Uh, but uh, Shobhit, Kate, Hillery,
thanks for ablely guiding us
through, uh, for our 50th episode.
And, uh, thanks for joining us.
Uh, if you enjoyed what you heard,
uh, you can get us on Apple Podcasts,
Spotify, and podcast platforms everywhere.
And we will see you next
week on Mixture of Experts.