2024 AI Recap and 2025 Outlook
Key Points
- The hosts crown Gemini, Flash, and the evolving Llama series as 2024’s standout AI models, signaling a shift toward ever‑larger, high‑performance systems.
- They predict a major “agent boom” in 2025, envisioning “super agents” that will dominate applications across the tech landscape.
- While NVIDIA remains a key player, the panel expects new entrants and increased competition in AI hardware, challenging its longstanding dominance.
- The discussion stresses that AI progress should stay transparent and safe, emphasizing openness over “black‑curtain” development as the industry matures.
Sections
- 2024 AI Highlights & 2025 Predictions - The episode recaps 2024’s standout AI models and trends while speculating on 2025 developments like super agents, hardware shifts, and openness.
- Scaling, Tool Augmentation, and Model Distillation - The speakers outline the shift from ever‑larger proprietary AI models toward tool‑enhanced agentic flows, synthetic‑data‑driven teacher models, and the distillation of high‑performance, cost‑effective smaller models using curated enterprise data, heralding a new AI landscape beyond 2025.
- Enterprise AI Gains Amid Competition - The speakers discuss mixed demo results, emphasize a recent surge in enterprise adoption of generative AI that relies on proprietary data, and note the intensified rivalry among major AI players.
- Rise of Small Local AI - The speakers discuss how increasingly compact, on‑device AI models with persistent memory and tool access will enable privacy‑preserving XR/AR experiences and become crucial as regulation and personalization demand local computing.
- Future AI Companion & Multimodality - The speakers discuss shifting from competing with AI to treating it as a collaborative companion, evolving evaluation benchmarks, and anticipate multimodal capabilities becoming a major focus by 2025.
- Multimodal Voice-to-Voice AI - The speakers highlight how emerging any‑to‑any multimodal models that convert speech directly to speech outperform traditional speech‑to‑text pipelines, and note that 2024 has become the year of AI agents.
- Race to Define Agent Protocols - The speakers examine Meta’s early Llama Stack push as a cue that the AI field is moving beyond OpenAI’s dominant API model toward competing standards for agent intercommunication, and they speculate on which firm will ultimately lead this emerging ecosystem.
- Defining Super Agents and Security Risks - Panelists define a “super agent” as a next‑generation AI that combines advanced reasoning, inference compute, and tool access, and warn that its imminent widespread deployment will expose highly underrated security and data‑leakage challenges.
- Niche Translation and Agent Web - The speaker highlights opportunities for small language models in under‑served translations and domain services, and predicts that by 2025 the web will need a new, agent‑friendly data format to replace HTML.
- Unified AI Agent Interfaces - The speaker envisions AI agents serving as a natural‑language operating system that integrates fragmented business tools and eventually automates software development.
- AI Agents and Hardware Outlook - The hosts highlight the emerging power of AI agents for democratizing full‑stack app development and segue into a conversation with AI hardware experts about the future of AI infrastructure.
- Emerging AI Chip Landscape - The speaker outlines new AI hardware startups, wafer‑scale chips, and big players like Broadcom and Qualcomm entering the market, noting a shift from NVIDIA dominance toward inference‑driven opportunities.
- Nvidia’s Training Market Dominance - The speaker predicts Nvidia will keep controlling AI training systems through its GPU and high‑performance networking suite (via Mellanox), with AMD and Intel unlikely to compete effectively until around 2026‑2027.
- Open Standards Disrupt Nvidia Edge AI - The discussion highlights how emerging open AI frameworks and dedicated inference engines are reducing reliance on NVIDIA, enabling broader competition in edge inference hardware while noting Apple’s push into AI chips as a key upcoming trend.
- Underrated Trends in AI Hardware - The speaker emphasizes overlooked developments like real‑time compute optimizations (e.g., test‑time compute) driving tighter hardware‑software co‑design, and the expanding accessibility of AI hardware ecosystems illustrated by models such as Llama 3 for both research and consumer applications.
- Granite 3.0 Open-Source AI Milestone - It highlights IBM's release of the Granite 3.0 family—Apache‑2 licensed, transparently built language models with ethical data sourcing—as a defining product moment of 2024.
- AI Governance and Safety Priorities - The speakers shift from early AI experiments to stressing the need for robust governance, copyright compliance, cost management, and safety guardrails—highlighting IBM's watsonx and recent AI safety summits as pivotal steps toward viable, responsible AI deployment.
- Inference Runtime Risks and Open‑Source Parity - The speakers discuss how using inference runtime for model self‑reflection introduces new security vulnerabilities yet offers greater control, and they predict that by 2025 open‑source AI will reach or exceed closed‑source capabilities.
- Evolving AI Interfaces and Co‑Creation - The speakers discuss the need for optimized inference stacks, the shift beyond chat‑based AI interfaces toward new interaction models, and the rise of collaborative co‑creative tools.
- Modular Expert Architecture & Agent Middleware - The speakers discuss the need for modular AI components and middleware to manage and orchestrate specialist experts and multi‑agent systems, highlighting emerging research and startups.
Full Transcript
# 2024 AI Recap and 2025 Outlook **Source:** [https://www.youtube.com/watch?v=l8plyR8aqVQ](https://www.youtube.com/watch?v=l8plyR8aqVQ) **Duration:** 01:01:27 ## Summary - The hosts crown Gemini, Flash, and the evolving Llama series as 2024’s standout AI models, signaling a shift toward ever‑larger, high‑performance systems. - They predict a major “agent boom” in 2025, envisioning “super agents” that will dominate applications across the tech landscape. - While NVIDIA remains a key player, the panel expects new entrants and increased competition in AI hardware, challenging its longstanding dominance. - The discussion stresses that AI progress should stay transparent and safe, emphasizing openness over “black‑curtain” development as the industry matures. ## Sections - [00:00:00](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=0s) **2024 AI Highlights & 2025 Predictions** - The episode recaps 2024’s standout AI models and trends while speculating on 2025 developments like super agents, hardware shifts, and openness. - [00:03:03](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=183s) **Scaling, Tool Augmentation, and Model Distillation** - The speakers outline the shift from ever‑larger proprietary AI models toward tool‑enhanced agentic flows, synthetic‑data‑driven teacher models, and the distillation of high‑performance, cost‑effective smaller models using curated enterprise data, heralding a new AI landscape beyond 2025. - [00:06:08](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=368s) **Enterprise AI Gains Amid Competition** - The speakers discuss mixed demo results, emphasize a recent surge in enterprise adoption of generative AI that relies on proprietary data, and note the intensified rivalry among major AI players. - [00:09:10](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=550s) **Rise of Small Local AI** - The speakers discuss how increasingly compact, on‑device AI models with persistent memory and tool access will enable privacy‑preserving XR/AR experiences and become crucial as regulation and personalization demand local computing. - [00:12:15](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=735s) **Future AI Companion & Multimodality** - The speakers discuss shifting from competing with AI to treating it as a collaborative companion, evolving evaluation benchmarks, and anticipate multimodal capabilities becoming a major focus by 2025. - [00:15:21](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=921s) **Multimodal Voice-to-Voice AI** - The speakers highlight how emerging any‑to‑any multimodal models that convert speech directly to speech outperform traditional speech‑to‑text pipelines, and note that 2024 has become the year of AI agents. - [00:18:29](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=1109s) **Race to Define Agent Protocols** - The speakers examine Meta’s early Llama Stack push as a cue that the AI field is moving beyond OpenAI’s dominant API model toward competing standards for agent intercommunication, and they speculate on which firm will ultimately lead this emerging ecosystem. - [00:21:33](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=1293s) **Defining Super Agents and Security Risks** - Panelists define a “super agent” as a next‑generation AI that combines advanced reasoning, inference compute, and tool access, and warn that its imminent widespread deployment will expose highly underrated security and data‑leakage challenges. - [00:24:37](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=1477s) **Niche Translation and Agent Web** - The speaker highlights opportunities for small language models in under‑served translations and domain services, and predicts that by 2025 the web will need a new, agent‑friendly data format to replace HTML. - [00:27:41](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=1661s) **Unified AI Agent Interfaces** - The speaker envisions AI agents serving as a natural‑language operating system that integrates fragmented business tools and eventually automates software development. - [00:30:43](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=1843s) **AI Agents and Hardware Outlook** - The hosts highlight the emerging power of AI agents for democratizing full‑stack app development and segue into a conversation with AI hardware experts about the future of AI infrastructure. - [00:33:46](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2026s) **Emerging AI Chip Landscape** - The speaker outlines new AI hardware startups, wafer‑scale chips, and big players like Broadcom and Qualcomm entering the market, noting a shift from NVIDIA dominance toward inference‑driven opportunities. - [00:36:49](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2209s) **Nvidia’s Training Market Dominance** - The speaker predicts Nvidia will keep controlling AI training systems through its GPU and high‑performance networking suite (via Mellanox), with AMD and Intel unlikely to compete effectively until around 2026‑2027. - [00:39:58](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2398s) **Open Standards Disrupt Nvidia Edge AI** - The discussion highlights how emerging open AI frameworks and dedicated inference engines are reducing reliance on NVIDIA, enabling broader competition in edge inference hardware while noting Apple’s push into AI chips as a key upcoming trend. - [00:43:04](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2584s) **Underrated Trends in AI Hardware** - The speaker emphasizes overlooked developments like real‑time compute optimizations (e.g., test‑time compute) driving tighter hardware‑software co‑design, and the expanding accessibility of AI hardware ecosystems illustrated by models such as Llama 3 for both research and consumer applications. - [00:46:10](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2770s) **Granite 3.0 Open-Source AI Milestone** - It highlights IBM's release of the Granite 3.0 family—Apache‑2 licensed, transparently built language models with ethical data sourcing—as a defining product moment of 2024. - [00:49:34](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=2974s) **AI Governance and Safety Priorities** - The speakers shift from early AI experiments to stressing the need for robust governance, copyright compliance, cost management, and safety guardrails—highlighting IBM's watsonx and recent AI safety summits as pivotal steps toward viable, responsible AI deployment. - [00:52:39](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=3159s) **Inference Runtime Risks and Open‑Source Parity** - The speakers discuss how using inference runtime for model self‑reflection introduces new security vulnerabilities yet offers greater control, and they predict that by 2025 open‑source AI will reach or exceed closed‑source capabilities. - [00:55:53](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=3353s) **Evolving AI Interfaces and Co‑Creation** - The speakers discuss the need for optimized inference stacks, the shift beyond chat‑based AI interfaces toward new interaction models, and the rise of collaborative co‑creative tools. - [00:58:58](https://www.youtube.com/watch?v=l8plyR8aqVQ&t=3538s) **Modular Expert Architecture & Agent Middleware** - The speakers discuss the need for modular AI components and middleware to manage and orchestrate specialist experts and multi‑agent systems, highlighting emerging research and startups. ## Full Transcript
All right, looking back at 2024,
what was the best model of the year?
For me, it's going to be Gemini and Flash.
And I'm going to nominate a sequence, I think,
which is the sequence of the Llama models.
So is the bubble finally going
to burst on Agents in 2025?
Agents are the world.
Agents are everything.
And in 2025, we're going to have super Agents.
In 2025, is NVIDIA Still going to be king.
Not only NVIDIA is here, but we also see new
entrance or the the other players in the market.
Are we going to end up
having openness and safety?
You can do this out in the open.
It does not need to be behind a a black curtain.
So to speak.
All that and more on today's mixture of experts.
I am Tim Hwang and welcome
to Mixture of Experts.
Each week, MoE is dedicated to bringing
the gold standard banter you need
to make sense of the ever evolving
landscape of artificial intelligence.
Today, we're looking back at
the huge evolutions across 2024.
You know, just to take you back, in
January of 2024, we're all chattering about
the release of the GPT store, Claude 2.1's long context window,
and I think at that point, we were still
waiting for the release of Llama 3.
Uh, 2024 was incredible, obviously a dynamic
year in AI, and so what we've done is we've
gathered a bunch of our best panelists to
talk about what stood out to them, what
didn't go as well, and maybe what they'll
think, uh, about what happens in 2025.
We're going to talk about agents, hardware,
product releases from the whole year, and
But first, we're going to start with what
happened in the world of AI models in 2024.
And to help us unpack the journey we've
been on, we have with us Marina Danilevsky,
who's a senior research scientist, and
Shobhit Varshney, senior partner consulting
on AI for US, Canada, and Latin America.
And so I want to actually start with maybe a
quick Uh, more kind of closer story, right?
Even before we zoom back to the,
you know, dark ages of January 2024,
uh, which is the release of o1.
Um, you know, obviously this was
a big announcement, one of the
biggest announcements of the year.
And I know a show, but before the show, you and
I were talking, you've You wanted to kind of
get in and actually just point out that like
the release of o1 is actually marks a pretty
big change in how these companies are thinking
about doing models and scaling these models.
And maybe we'll just start
there if you want to jump in.
Excellent.
It's such a great time to be alive.
Um, what we see all around us, like there's
no other year in your entire career life
that you would rather be alive than today.
In the last year or a year or so,
we saw the era of scaling laws.
We got to a point where We realized that adding
more compute, building larger models, and
driving higher performance got us incredible,
incredible performance from these models, right?
So we got to a point where we, we
have insanely large models, now
Llama 4 and 5 billion parameters, 1.75 from GPT 4.
You can see this huge set of big
models that are doing amazing work.
Now we are transitioning to a couple
of different shifts in the market.
One, we are seeing more of the shift
moving towards the inference phase of it.
Slow down, think about what you
want me to do and think through a
plan and come up with an answer.
We also started to give these models
more tools that they could use, just like
we learned to use tools as we grow up.
So we have these agentic flows that are
helping us increase the intelligence as well.
We also saw a big shift in the overall cost.
The cost of these proprietary models
implemented in the last year or so.
But then smaller models got more
and more efficient and started
to perform much much better.
So we've seen this shift towards insanely
large models that can think a lot more.
We saw us run out of all the public
internet data and now we're focusing a lot
more on high quality enterprise data or
stuff that's built for specific models.
So we're now getting to a point where you have
a teacher model that's insanely large, really
well thinking through the whole problem, that
can create synthetic data, can help Train a
smaller model can distill a model that can
deliver high performance at a lower price point.
So we've shifted this, shifted quite a bit
in how we think about AI models and how
we have been investing in building them.
2025 and beyond is going to be a
completely different ballgame in what
we see with what AI models would do.
Marina, what are your thoughts?
Yeah, I think you're right.
It's been a really interesting year in terms
of where we started, where we've ended up.
We've seen that, yes, we can go
bigger and bigger and bigger.
And now we're finally there.
We can say, great.
So how well can we still keep
going now that we can go so far?
smaller.
So that initial research push of how big can
we go, we've finally given ourselves the luxury
of, all right, now it's time for efficiency.
Now it's time for cutting costs.
Now it's maybe eventually time to talk about
environmental aspects and things of that nature.
Maybe next year.
Is that a prediction for 2025 or?
2025.
Um, So that, that part is very interesting.
It also means that the quality has gotten
to where we can start to, uh, build
enterprise grade solutions reliably.
And I'm, I'm excited for that.
I know we're not talking about
next year yet, but that's the
thing that I'm really excited for.
The quality is there, I think finally.
And we, we can start getting real
serious about enterprise solutions.
Yeah, I mean, I think that seemed
like a really big trend this year, you
know, was certainly someone who kind
of like does software engineering in
their free time, kind of as a hobby.
This is the year where I was like,
wow, I am finally able to do stuff
with these coding assistants that like
I would not otherwise be able to do.
It's like finally fit for purpose for
me to kind of use on a day to day basis.
And I think that was, that was a very big.
jump, um, I think that, you know,
we noticed in the last 12 months.
I guess Marina, are there particular stories
that stand out to you from like, I don't
know, earlier in the spring or otherwise
where you're like, Oh, I'll, if when I look
back on 2024, I'll really remember it for X.
I mean, first of all, I'll remember it for just
the, uh, very, very high levels of competition.
It felt like every two weeks somebody was
coming out with something and companies
that you maybe wouldn't even expect
like even which is very recently Amazon
being like, Oh, they're working on that.
Oh, that's actually pretty good.
So I think I'll remember it for a lot of
people trying to, uh, really one up each
other, uh, in a, in a good way, in a way that
actually really pushes the thing forward.
But I think that the number of players that
we have this year is, uh, what's really going
to make it stand out for me and some of the.
You know, as we talked about in previous
episodes, some of the debuts were more
successful, some were less successful.
Sometimes people didn't quite,
you know, double check everything.
Maybe sometimes people thought that
the demos were a little bit overcooked.
Um, and so I, I think that that's, That's the
thing that'll make me really remember the year
is the different ways of how do you join in the
competition and introduce your, your flavor.
Shobit, how about you?
I think from an enterprise perspective,
uh, this is an amazing year.
We, we recently ran a survey for our AI report
and about 15 percent of our clients globally got
real tangible value by applying generative AI.
There's a lot of, uh, knowledge
that was locked in documents and
processes, things of that nature.
And we saw meaningful movement and
how clients are focusing on it.
a few small complex workflows and
delivering exceptional value out of it.
I think we did not get enough value out
of the generic co pilots or assistants.
That has shifted more towards, hey, this
really has to be grounded in my data and
my knowledge and things of that nature.
But overall, the last two weeks that we
just went through, I think that was the most
action we've ever seen in the last two weeks.
Two, three years of AI, what the competition
between open AI and Google and then meta jumping
in that that has been a phenomenal, phenomenal
movement in the community together and now we're
starting to see us move towards, hey, we have
exceptional models, how do we start to then
control them a little bit more, adapt them to
our enterprise workflows and our data sets and
have them think and reason with tools and things
of that nature more the big movements around o1
I think it's going to go down in
history as a big, big point in time
when we started to realize that 200
a month is actually great value.
You start to get to a point where if
you're thinking about how you, if you're
spending 200 bucks a month, you're really
being very focused on which workflows truly
can see an uplift and apply AI to them.
Now you're at a point where you're
really paying somebody to do that.
augment every aspect of your daily life.
I think we're great, great
momentum to start 2025.
Yeah, for sure.
And I guess, I don't know if like folks have
nominees on superlatives for this, or it's
like, is o1 the, the release of the year?
I mean, I think from a model standpoint, or
were there other ones that kind of stand out?
I mean, I guess we also had
like, Mama 3 this year, right?
It was also a huge, huge announcement.
For me, it's going to be Gemini, uh, uh, Flash.
I think what they've just done with a small
model that does multimodal, that's going to
drive the next two, three years of computing.
And the reason I say that is
everything that you can now unlock.
If you guys followed the Android XR
announcements recently, you're now
at a point where multimodal models
were inherently insanely large.
They needed a lot of compute,
always happened on the servers.
Now with models like Google Flash, you're
getting to a point where a small model
can do multimodal really, really well.
And the thing that will blow you
away is how it starts to remember
things that you've just seen, right?
I think it's going to start augmenting
all parts of our, uh, of our day
to day workflows, including memory.
That's something that we have not seen so far.
Uh, we used to generally ask
questions in a very cold start.
Now we'll get to a point where these
models will have infinite memory,
can have access tools like we do.
I'm very excited about high performance
at a really small, uh, size.
So we can then eventually get to this, Compute
infrastructure where you can have XR AR
experiences and you can bring compute more and
more closer to the devices that will drive a
lot more of privacy as well because then the
data is locked into those devices that I'm
carrying with me versus somebody else's cloud.
Yeah, I want to agree with that.
Actually, the, the small model, the small models
thing, because I think that we're going to
start at least in the next year or two, seeing
a lot more, uh, formal regulation going on and
a lot more people waking up to what does it
really mean as you're talking Shobit, but if the
models are starting to remember, starting to be
personalized, starting to be customized, that's
going to become extremely, extremely relevant.
So having something small, local that you can
actually have that guarantee technologically.
That's going to become very, very important.
I agree with you.
Yeah, for sure.
And how about you, Marina, I think
in terms of like, you know, I know
Shobhit was saying, oh, one was huge.
Like if you have like a, you know, best
model of the year kind of nomination.
That's a hard one.
I, I like seeing them in a holistic way.
And I feel like it's hard to tell at the moment
when something is actually going to, uh, you
know, turn, turn in, I'm going to nominate a.
sequence, I think, which is the sequence of
the Llama models, not the Llama models itself,
but the sequence of we're going to have Llama
3 and then we're, so we've seen what we can
do with pre training and then we're going
to see what we can do with post training.
So we're going to get bigger, bigger,
bigger, bigger, and then we're
going to see how far down we can go.
I'd like to see a consistent perspective
of that as a sequence that people try
of push the pre training, push the post
training, push the size and do that
iteratively, iteratively, iteratively.
I'd like to see that continue to be a thing.
Yeah, I feel like that's like how we know
you're a connoisseur, Marina, is like, you
like, you, you like the curation of Llama.
It's not just like any given
model is the best model.
Marina, I think we'll get to a point
where the big research labs are going
to build even bigger, bigger models.
But they may not release them
in the public as a model.
And we use that more for creating
synthetic data, for displaying teaching
as a teacher model, and so forth.
But I'm really excited about, we're finally
coming to a point where we've poked at this for
a while, and we said, oh, if I just ask this
model to think before it answers, Well, this
is what elementary school teacher kids, right?
And now we're trying to relearn how we
teach young kids on how do they look at
the like, try different things out, create
a plan, answer the question, go pick
up a calculator if you really need to.
And don't try to do this in your
head, things of that nature.
Like I, I feel that we are, I have little
kids and I've spoken about that quite
a bit and I feel that we are, we are,
there's so many similarities between
how we are training and we're doing some
reinforcement learning with our kids and
giving them rewards and mechanisms in place.
We are breaking problems into smaller chunks
and they go solve each one of them separately
and there's a whole positive reinforcement
around them and they get things right.
I think we're getting to a point where we're
getting to learn how these models learn and
that becomes a good symbiotic relationship.
I think we will stop.
Asking these models to do things that
humans do really well, and we'll have a
better mutual appreciation of which things
should be delegated down to these models.
And that also means that benchmarks
and how we evaluate these models
are going to change quite a bit.
But I think today we're starting to
get to know these models really well.
And 2025 and six will have a very
different relationship with these
models, becoming more of a companion.
Versus trying to figure out, hey,
can you do this as well as I do?
Yeah, absolutely.
Yeah, I think one of the funniest outcomes
of this year has been all the examples
of, like, could you just try harder?
And then, like, the model actually just
does better, which is, like, very funny.
I mean, computers did not used to do that.
So, um, so I think maybe a final question,
and then we can wrap up this segment, um,
is we haven't talked so much about, Uh,
multimodality, but it really seems poised
to become a really big deal in 2025.
I'm curious, I guess maybe Marina, I'll
start with you, if, if you've got kind
of predictions for what's coming up
in the next year for, for multimodal.
Yeah, multimodal, uh, that's something
where we had those thoughts when foundation
model sort of first came on, cause we
were all very excited about the fact
of, oh, well, it's just tokens in order.
It doesn't have to be text.
It can be anything, but then I think the
reason we all went into text as one of the
very early code being part of it, I think,
is the amount of training data that we
had, the amount of examples that we had.
So especially now that we've gotten better
with synthetic data and with, like you
said, Shobhit, but you were referring to
teacher models, we're going to be able
to explore that space, uh, a lot more.
And so I, I think that they might
finally, uh, be at the point
where once again, they are useful.
There's huge interest in, uh, having the
multimodal models because now, you know
how with the text models, we had the
idea that when you have one doing lots
of tasks, it learns from each other.
Now it's going to be even more interesting
where if you have a multimodal model,
does that make it actually also better
at each of the individual modalities?
Again, I think the data is now finally
there, not just the compute, but the
data and the ability to create more data.
Um, and so I think that, yeah,
next year we should see more.
I think I was expecting to see
maybe a little bit more models that
aimed at the sciences this year.
Maybe now again, next year, uh, maybe
models that are going to be more
successful with video, not just Sora, Sora.
But something that is maybe a
little bit more useful lower down,
think like, uh, with robotics.
There's a lot of, uh, things to be minded there.
So that's, I guess where I, I see those
maybe, yeah, the flashy parts are fun, but
the real usefulness is somewhere a little
bit lower down with, um, with the hardware.
No, I think the multimodal space is going
to be amazing the next couple of years.
And I think it is important for it to
understand all aspects of what humans are
seeing, feeling, looking at, reading, and
listening before it comes and helps us.
Um, I think it's going to have a huge impact
on its understanding of the world around us.
So far, we have done things where, hey, I will
take a picture of something or I'll translate
that into text and ask a question of a chatbot.
That paradigm has not scaled.
As the, as the multi modal
models get better and smaller.
Like the Gemini 2.
0 Flash Experimental, those are the ones
that are going to drive more and more
richer experiences in our day to day lives.
And the competition is
going to be very, very high.
You will see these models come
out from any, from everywhere.
Uh, the Any2Any, from speech to speech
directly, those kind of models are
delivering exceptional customer experiences.
If you go for, if you look at traditional ways
of doing AI, you would go speech, To text.
You take that text, you
pass it to a, to a AI model.
AI model figures out what to respond
with, and you go back from text to speech.
A lot is lost in translation and transcription.
Now, when you start doing, um, from media
to media, you go from voice to voice.
It starts to understand the
nuances of how humans talk.
I'll, I'm very excited about
the next year of multimodal.
Small and then starting the full context.
That's awesome.
And that's all the time we have
for today to talk about AI models
showbirth marina Thanks for coming on.
Happy holidays, and we'll talk
next year about all this and more
For our next segment I want to talk about agents
in 2024 and to help me do that I'm gonna bring
in Chris Hay distinguished engineer CTO customer
transformation and Maya Murad who is the product
manager for AI incubation Maya, Chris, welcome
back to the show Well, so in 2024, uh, it was
the year of the agents, agents, agents, agents.
I think it almost became a little bit of
an in joke at MoE that if we had an episode
that did not include agents, uh, that was
a really big thing and an unusual thing.
Um, and so I guess probably
let's put it this way.
And I guess maybe Chris, we'll, we'll
throw it to you first is, um, Agents
over hyped in 2024 or under hyped in 2024
under hyped, not hyped enough.
Agents are the world agents are everything, and
in 2025, wow, we're gonna have super agents.
That's what's coming in 25.
Okay, um, and I guess Maya, I mean, looking
back, um, I don't know if you'd agree with
Chris or if there's like particular stories
in 2024 that really stood out to you in
the development of agents, if they're
going to be as big as Chris says for 2025.
So I definitely agree 2024, I would say
it was a lot of talking about AI agents.
Um, I'm excited to see more execution
and what I expect to see is more quality.
Hurdles.
Once we see more agents
being pushed into production.
I think we're just scratching
the surface of what is needed.
A trend that I'm starting to see
right now this year is having more
protocols and standardization efforts.
So we saw that Meta is attempting to
do that with the Llama stack, Anthropic
with their model context protocol, MCP.
Um, so I think it's going to be this little
battle for how do we standardize how LLMs
interact with the external world, how
agents, I think in the future it's going
to be how agents interact with each other.
Um, and I think this is where the next
frontier is and where a lot of our
efforts I was going to be heading towards.
Yeah, this felt like a big, like,
almost like a preparation year.
I was looking at all the news stories and
I was like, is the biggest agent story
of the year that Salesforce is hiring
a lot of sales agents to sell agents?
Like, it feels like, and then between
that and the technical standards, it's
almost kind of like, it's almost far and
few between to be like, oh yeah, this
was the killer agent release of the year.
Um, and actually, in fact, a lot more prep.
I don't know if Maya, you'd agree with that.
It felt like it was the year of bracing
for what's to come and all the different
things we needed to consider and
then who wanted to own that category.
So it was really interesting that for
example, Meta went out early and with, so
the first iteration of Llama Stack was.
a little bit rough, but what they were
trying to do with their saying, we're in the
long term, we're in this in the long term.
And we want to help define those
agent intercommunication protocols.
And I have faith if, if that's a direction
that Meta wants to take, I'm sure
they're going to do a good job at it.
But this is also signaling
something interesting.
Um, the last two years, it's, um, Mainly the
field reacting to what open AI put out so open.
I put out their chat completions API
and the whole ecosystem followed suit.
And if you didn't have that exact API, your
thing was much more difficult to consume.
And now we're seeing a lot
more players contend to.
Uh, being the one setting
those standards and protocols.
Yeah, for sure.
And maybe, I guess, Chris, to turn it
back to you, I mean, you're, I think
you just used the phrase, agents are
the world, which is a very bold claim.
But, I mean, 2025, I mean, you know, let's
say agents are a lot more popular, become a
lot more prominent as a part of the landscape.
You know, is it meta that's well positioned
to win here or do you, do you have any
predictions about what we're going to see
in terms of who's going to be leading in the
space versus maybe a little bit further behind?
So I really like what Maya had to say on
Anthropic and the model context protocol.
I actually think that is going to be one of
the biggest enablers for agents next year.
And I think the problem that they've solved
really well is allowing remote calling of tools.
That's probably the biggest thing
that they've solved there, right?
So yeah.
If we think about the enterprise for a second,
you're not going to have agents that are sitting
scouring the web, or they're going to be,
uh, sitting downloading documents, whatever.
It's going to be access
to your enterprise tools.
It's going to be things like accessing Slack,
it's going to be accessing your, uh, Dropbox, or
your box folders, or whatever, or your GitHub.
And a lot of that is being standardized.
But more importantly, you want to take your own
data, and then expose your own APIs, and expose
that in a way that agents can consume data.
In a standardized way.
And I think MCP has done a really
good job of allowing you to remote
call tools and then be able to chain
them together with multiple servers.
And I think that's going to be a big enabler.
Now what's interesting and what
they've done there is it is easy to
hook up different LLMs, for example.
So it's not tied to the cloud stack there.
You can hook up any other model that you want.
And.
It's all tied in to function calling,
which again, was a standard that
was created by OpenAI in that sense.
So, I like what you said there, Maya,
about, you know, different providers
coming in, and coming in an ecosystem.
And I think that's what I'd like to
see happen is no one company winning.
And this is ecosystem of providers is going
to push everything forward, and we're going to
enter this world of the big agent marketplace.
And that's why I say super agents
are coming, because it's going to
be this really big ecosystem that's
going to start to emerge in 2025.
And when you say super agent,
what do you mean exactly?
I just made up the term Tim, so.
You heard it here first on MoE.
A really good agent.
That's a super
coming from super intelligence
or is this your definition or
is it in the sense of like
a Hollywood super agent?
Actually, I,
thanks for the save there, Maya, right?
I'm going to define a super agent as
the combination of the reasoning models.
The inference time compute models are coming
out just now combined with tool access.
So therefore they're more powerful
than the agents that you have today.
So there you heard it first.
You're right, Tim.
That's what a super agent is.
Very nice.
Uh, Maya, you had a funny phrase when you
were kind of giving your reaction to my first
question, which is, you know, next year's agents
are going to be everywhere, but it's also going
to be the year we're going to discover, like,
where the, the barriers or the limitations are,
you know, basically this kind of the full force
of agents going to become crashing onto reality.
And I think we're going to learn a lot.
And, you know, I guess one question I've
been asking a lot of the panelists for this.
This episode is, you know, what's underrated?
What are people not thinking about that
are, that's likely to be like a big
hurdle, right, for agents going forwards?
Number one answer, security.
Super underrated.
I think it's already being reported
that a lot of the existing players in
the space are leaking sensitive data.
And I, I, see agents as a way of
exacerbating these inherent risks of LLMs.
And I think we're under appreciating
what it takes to get it right.
I think the other thing is how to
nail the right human interactions.
When you have this ability to
automate more complex tasks.
What are the things that you still
need to delegate to the human?
How do you need to have a human in the loop?
How do you avoid an overtrust issue?
My team has done a number of user
studies and when information is presented
neatly by an actor that looks and seems
intelligent, it's really easy to take
everything surface level for granted.
And I think there's a whole new paradigm of
human computer interaction or maybe human
agent interaction that will be unlocked.
And I'm, I'm really excited for
what's to come because I think this
is inherently a creative exercise.
How do we keep, retain our creativity, retain
our ability to do critical thinking, and yet
automate certain parts of processes to AI?
Um, that will be a really
interesting paradigm to get right.
Yeah, I think that delegation problem is
going to end up being super, super hard.
Uh, I think, uh, yeah, it's very easy
to be dependent on, Even people who
sound smart when they're not actually.
It's like no different, I guess,
for, for agents, uh, as well.
Um, well, I guess put it this way is, you
know, it sounds like we're very interested.
And I guess the big prediction
from both the two of you seems to
be, you know, agent marketplaces.
Right. That's going to be maybe like the big
thing we're going to see, um, next year.
You know, I think one of the big questions
is also kind of like what's going to be the
first most popular agent use case in some ways.
Um, you know, you think
about the big marketplace.
There's a lot of things that agents could
do that may be fun to do, but, you know,
I think we're almost kind of looking like
what's going going to be the, what's going
to be the email of the agent world, right?
Like what's going to be the
slack of the agent world.
Um, curious in both of your experiences,
you know, talking to customers and
stuff with their particular things,
like in their hopes and dreams that
they really want to see out of agents.
And if there's kind of anything recurring there,
that's worth it for our listeners to know.
I think from my perspective, Tim, and that
marketplace, I think there is some obvious ones.
Like, Translation, I think, if I'm truly honest,
like language models today, I don't think
they've really nailed translation so well.
There's some models that do certain
languages really well, but then, um, if you
think of the more esoteric languages, for
example, um, the less popular ones, then
the, the large models aren't getting that.
And then it's going to be
specialized models that have been
trained in that specific language.
So, um, I think that's probably a real
opportunity for some of these smaller
language models combined with an
agent to offer translation services.
And again, add that into domain services.
So things like legal, which is
something you know very well, Tim, then
I think that will probably be a big.
piece of that marketplace, but I'm hoping that
it won't just be about these individual agents.
I think any piece of information, it
could be sports scores, it could be golf
scores, it could be information about
play, it could be absolutely anything.
And one of the things, and this is my
next prediction for 2025, is I think we're
going to get a shift in the world wide web.
So today, HTML, et cetera, is the dominant.
Uh, language, markup language of the internet.
That's not really well designed for
LLMs and not well designed for agents.
So I wonder if in order for the agents to
exist, not just having the marketplaces,
but having the way to expose that data,
we talked about MCP earlier, I wonder if
you're going to start to see new types of.
Page is appearing where the content
is optimized towards the agents for
consumption by agents and resources that
they expose as opposed to necessarily human.
So I'm, I'm kind of predicting we're
going to start to see this shift in the
web to a kind of, uh, dare I say a web 4.0, I'm trying to avoid the term web 3. 0 where we have content that is
specifically designed for agent consumption.
Yeah, it seems to be almost the prediction
that's kind of implicit in what both of
you are saying is that you know there'll be
so much interest in the promise of agents
that like almost we're going to be kind
of reconstructing the web to make it safe
for agents or make it work for agents.
And I guess a lot of the kind of stack
and a lot of the kind of interoperability
stuff that's being built is like
an attempt to do that in some ways.
Um, I don't know, Maya,, do you agree with that?
You think that's kind of like going to
be the future is like we'll have a, you
know, agent markup language basically.
Uh, A.T.M.L.
I think a lot of the interesting use cases
will be unlocked when different agents
that were built by different providers
that are owned by different organizations
are able to interact with each other.
And like, how do you
establish a safety protocol?
How are you able to do that productively?
Like the promise here is like, how do we
break out of all these silos of different
systems and having to manually architect
how each one speaks to each other?
And can we get to, uh,
Universal interaction protocol.
This is really an interesting promise.
I don't know if we will fully unlock it
next year, but a lot of different actors
would like to go into this direction.
And there's simple things that
we should nail before that.
So I know like software engineering tasks are
there's a lot of investment going that space.
I still think no one has nailed like the average
business user, the average business user has
to use, I don't know, a dozen of different
tools on their computer and their machine.
None of them speaks with the other.
Everyone has its own onboarding experience.
So I see a lot of opportunity to flatten
out these complex experiences and make
them much more dynamic and integrated.
And this is the true promise of this technology.
And it's the ultimate dream, I guess.
I mean.
Because the world you're describing is
almost like the agent becomes your entire
interface for all these applications, like
they stay independent, but like, yeah, the
operating system in the future really is the
agent that's doing things on your behalf.
It's natural language.
I was like.
LLMs changed our perception of how
we interact with the digital world.
We expect everything to be in natural language,
or you could do a form and then there's an
option to do natural language interaction.
And I think that expectation is gonna widen.
Yeah, no, I think that makes a ton of sense.
I guess maybe the final turn that we
should talk a little bit about is like on
the engineering and coding side, right?
I was thinking this year that like, The coding
assistance has gotten really, really good.
But the dream is that you eventually have
agents that are like, I'm really envisioning
a software code base that looks like this.
And it's able to kind of like build
and interoperate on all parts of
that, and all parts of your code base.
What do we think are the prospects for that
kind of automation and agentic behavior?
I'm going to kick off here, and I'm
going to be controversial as always.
And here is something for people to think
about, which is Programming languages
today are designed for human beings, right?
And if you think about things like
loops, while loops, for loops, etc.
There you have however many versions
and the same with conditionals,
if statements, blah, blah, blah.
But you know what?
When you get down to an assembly
level, none of that exists, right?
It's all back to branches and, you
know, uh, and jump statements, etc.
And therefore We are in an agentic
world, we're getting them to program in
a language that is designed for humans.
And the big challenge, I would say,
that I think is going to happen over
the next few years is that you're going
to have a more agentic native language.
Something that is more designed for LLMs
and therefore a less of a syntactic sugar
that you need to satisfy humans there.
So, I think there's going to be an
evolution in programming coming.
Um, And, and you can see
it already today, right?
The LLMs are already generating, uh, you
know, here's another Fibonacci function.
I don't, I don't need another
Fibonacci function in my life, right?
We got those.
Exactly.
So I then think you'll be like the equivalent
of kind of NPM or something like that, where
you have a big massive AI library where
you can pull the functions that you need.
So I think.
Like your AI operating system, I think we're
going to get an AI programming languages
and libraries that are going to be a
little bit more native, and then that's
going to help the development of coding.
So I think that's an interesting term.
Will it be 2025?
Maybe, maybe it's going to be 26,
but I think that's where we're going.
With the current technology we have, I'm like
super impressed with what I've seen with Repl.
it, with the ability to stand
up like full stack applications.
On the project I'm working on with Bee,
it's been such an interesting paradigm
like chat to build applications.
Um, I, I really see the ability to
create digital interfaces and code
bases being democratized in a way
that hasn't been able to for before.
Purely powered by the current
technology of agents that we have.
I just think there's this like last mile
problem to nail, and I think next year
this is going to blow up in a major way.
Nice.
Well, you heard it here first.
That's all the time that we have for agents.
Uh, that was a lot to cover
in a short period of time.
Chris, Maya, thanks for coming on
the show and we'll see you next year.
I want to move us on to talk about the
hardware that powered AI in 2024 and I can't
have picked a better duo of people to help
out in terms of explaining those, uh, than
the two that I have online with me today.
Khaoutar El Maghraoui is a Principal
Research Scientist, AI Engineering, AI
Hardware Center, and Volkmar Uhlig is Vice
President, AI Infrastructure Portfolio Lead.
Welcome to the show.
Volkmar, maybe I'll turn to you first.
So, you know, as we talk about hardware
on AI, it's almost become synonymous with
saying that we want to talk about NVIDIA.
Um, and, uh, I'm curious about what you thought
the biggest stories were this year from NVIDIA.
I mean, the one that strikes me is the
announcement of the upcoming GB200.
Uh, but curious if there's other things on
radar for you as we kind of think about,
you know, what were the big stories in 2024?
NVIDIA.
Made a big splash for the GB200.
Um, and I think we are seeing a big
shift towards more integrated systems
and protocol on the training side.
Very large, like rack scale computers now.
Um, liquid cooling is coming.
So all the things we've, seen over the
years how to get cramped more compute into
smaller form factor, you know, making it
faster, better networks behind it, etc.
And I think NVIDIA is really trying
to push hard on staying the leader.
Um, on, and then we are seeing upgrades,
which are kind of a reflection of
Um, how models are now looking like.
So we have 70 billion parameter models.
Um, and you know, the 70 billion parameters,
even if you quantize gigabytes at 8 bit.
It's 140 gigabytes at 16 bit.
Uh, now you don't want to
have to buy full cards.
So that we see an increase in
memory capacity across the board
of all the, uh, the accelerators.
Uh, but not only NVIDIA is here,
but we also see new entrants or the,
the other players in the market.
AMD is announcing a pretty
good roadmap of their products.
All that's very, very large.
Memory capacities and memory bandwidth to
address those large language models and fit more
model into less space or less compute like and,
uh, and Intel is playing in the market as well.
And then you have a handful of startups,
uh, where we also saw, you know, really
interesting technologies coming onto the market.
So if you look at, uh, Cerebros, that's a
wafer scale, uh, AI, which, you know, like.
A year ago, they were talking about it, now
you can actually use it as a cloud service.
You have Croc being a player, there
are other companies coming up, there's
D Matrix, which will have an adapter
coming out at the beginning of next year.
Um, and so I think, um, um, yeah, so I think
there's a good set of players in the market.
And then there are new entrants, right?
We just saw the, the Broadcom announcement,
um, pretty much, I think it was last week,
um, with very large, you know, revenue
targets, uh, and the relationship with Apple,
uh, and then Qualcomm is also in the game
and has a chip architecture coming, you
know, and being some of them are available
and there's a good roadmap for them.
So I think the market is not only NVIDIA
anymore, which is, I think, good for the
industry, and it's moving extremely fast.
So, and we have, we see training systems
there, but there's an an increasing.
Um, focus on inferencing because
from my perspective, it's kind
of where the money will be made.
Yeah, for sure.
And I guess, Khaoutar, I don't know if you
want to talk a little bit about that bit.
I wanted to make sure that we did talk a
little bit about kind of the big trends
in inferencing this year, because it feels
like that was actually a big, um, theme of
kind of how this market is developing out.
And, uh, if you want to speak a little bit to
that and where you think things went in 2024.
Yeah, so of course, there's a lot of, a lot
happening, especially around, um, inference
engines and optimizing inference engines.
Uh, a lot of hardware software co design is
also, uh, you know, playing a key role in that.
So, uh, you, we see technologies
like VLLM, for example.
Uh, we see also things like the, um, They
try to what they're doing and all the the
stuff around KV cache optimizations, the
batching for in the inference optimizations.
So a lot of that, a lot of innovations is
happening in open source around building
and scaling, inferencing, especially
focusing on large language models.
But a lot of these optimizations we see,
they're not only specific to LLM, they can
be also extended to other, to other models.
So, um, so a lot of development that's happening
at the VLLM, uh, there is work, you know, even
at IBM Research and others contributing to
open source to basically especially bring a
lot of these co optimizations, um, in terms of
scheduling, in terms of batching, in terms of
figuring out how to best basically collocate
all of these, uh, inference requests and get the
hardware to, uh, um, uh, run them efficiently.
Yeah, absolutely.
Volkmar, do you want to give us
a little bit of a peek into 2025?
I mean, it kind of sounds like with this
market becoming increasingly crowded, I think
everybody's coming after NVIDIA's crown here.
You know, what do you expect to happen in 2025?
Does NVIDIA largely still stay in the lead?
Or do we end in December 2025 with, you know,
the market becoming a lot more divided and
diversified than it has been traditionally,
particularly on the training side?
So I think the training side will be,
that's my prediction, will be still
very strongly in the hands of NVIDIA.
Um, I think AMD and Intel will
try to break into that market.
Uh, but I think that will probably
be more in the 2026 27 timeframe.
Uh, the reason why I'm saying this is,
um, the architecture you need to build, to
build a really successful training system,
it's not the GPU, it's, it's a system.
So you need.
Uh, really good, uh, low latency networking.
You need to have a reliability problem.
There's a, like, a strong push to actually
move compute into the fabric, um, to
further cut down the latency and more
efficiently utilize, uh, the hardware.
And, uh, NVIDIA, with their acquisition
of Mellanox, effectively bought the number
one network vendor for high performance
computing, which, you know, training is.
And so there is a, there's a, you
know, a bunch of consortiums coming up.
There's Ultra Ethernet, um, where, you
know, they're trying to get to a similar
capabilities what you have with InfiniBand.
And InfiniBand, despite that it's an
open standard, there's pretty much
only one vendor on the planet, which is
Mellanox, which is now owned by NVIDIA.
So I think NVIDIA has a good,
uh, you know, lock on that.
side of the market, and therefore a lot of the,
of the investments where other people are, are
playing is more in the inferencing market, which
is much easier to enter, you know, because you
intrinsically not only have NVIDIA systems,
like you don't have NVIDIA on cell phones, you
don't have NVIDIA on the edge, and so there
is a, and the software investment you need to
do on inferencing is, is much lower than what
you have on training side, so I think training
is, is in, in, um, Very safe hands for NVIDIA.
So unlocked, yeah.
But I think there is now enough with
Gaudi 3 coming online, which has
integrated Ethernet, uh, you know, the,
and what AMD is putting on the market.
I think there will be, it will
be a slow creep into that market.
And I think, you know, in 2026, we will probably
see, um, that, you know, there is a major break
in into that market, and NVIDIA loses that.
That very unique position it has right now.
Yeah. It's going to be a big transition.
Khaoutar, do you agree with
that for the 2025 prediction?
Yeah, I agree
with that.
Of course, there's a rising competition
in AI hardware, like Volkmar mentioned,
companies like AMD, Intel, and startups
like Groq and Graphcore, they're
developing competitive hardware.
IBM also is developing, uh, competitive
hardware for training and inference.
The problem with the NVIDIA GPUs is
also the cost and the power efficiency.
The NVIDIA GPUs are very expensive and
they're power hungry, making them less
attractive, especially for the edge
AI and the cost sensitive deployments.
So the competitors like AWS Inferentia,
IPUs, they offer specialized hardware
that's often cheaper and more energy
efficient for certain applications.
So.
And I think, you know, the open standards, for
example, like the open AI Triton, um, and the
Onyx and new, you know, these new frameworks,
they're also working a lot on reducing the
reliance on NVIDIA's proprietary ecosystem,
which makes it makes it really easy for
competitors to gain also some traction here.
And if we look at the inference specific
hardware, there is, you know, these RISE, like I
mentioned VLLM before, this dedicated inference
engines like VLLM, SGLang, Triton, they
highlight the potential for non NVIDIA hardware.
So they're opening up the door for
the competition, uh, easy entry.
And they also, and allow them also
to excel in inference scenarios,
especially for large language models.
So, Uh, we'll see uh, this widespread emergence
of edge inference solutions powered by ASICs.
Uh, and, and I think this is
challenging NVIDIA's role in this
rapidly growing edge AI market.
Yeah, and I think the edge is, I think is the
last bit I wanted to make sure that we touch
on before we move on to the next segment.
Um, you know, Volkmar, it seems to me
that obviously one of the big stories
was Apple moving into Apple intelligence
and making sure that all the, you
know, essentially AI chips on them.
Um, I assume that's going to continue to
2025, but I'm curious for our listeners that
are less involved in watching the hardware
space day to day, if there's any trends that
you think are worth it for people to pay
attention to as we get into the next 12 months.
I think the Apple model is, uh, is
very elegant and protocol when you
are in a power constraint environment.
Um, so you, you know, whatever you can
do in that power constraint environment
with less accuracy you do on device.
And then whenever you need
more, you go somewhere else.
Uh, I think also the, the Apple.
Uh, architecture that they are running on this
on on the same silicon as they are running,
you know on their phone They run in the cloud.
It's a it's a very Interesting architecture
because it simplifies it for the developer.
It simplifies it in deployment And so I
think that we will see more Of that type
of separation, and I think we will see more
compute happening on edge devices, and we're
going now as silicon matures, and you know,
there are there's more choices and you don't
need a high powered card anymore, and the
silicon gets more and more specialized for
that, you know, simple matrix multiply, I
think we will see pretty much every every chip
which will leave a factory will effectively
contain AI capabilities in one form or another.
And then it's really this hybrid
architecture of on device and off device
processing, which allows to have, you know,
silicon live for a long period of time.
But if you're on an edge, You know, and
Edge is not only a phone, it could be an
industrial device, where you know, you
know, your life cycle is five to ten years.
You don't want to go and every two years
have to swap out the chip just because
you want to train another network.
And so I think the architecture Apple put
out will be uh, more solidified and we
will see, you know, software ecosystems
building being built around that.
Yeah, that's great.
Well, Khaoutar, I'll let
you have the last word here.
Um, I've been asking most panelists as they've
been coming on, what is the most underrated
thing, um, in this particular domain?
So for AI hardware, are there things
that people are not paying attention to?
Um, you know, there's a lot of
hype in the AI hardware space.
So I'm curious if there's any more
subtle trends that you think are
important to pay attention to?
Yeah, that's a, that's a great question.
So I think, um, there is a lot of work
around real time compute optimizations.
Um, technologies, for example, like
the test time compute, uh, which
allows AI models to allocate additional
computational resources during inference.
This is something that we
saw with OpenAI o1 model.
It's really, I think it sets some
precedence here and it allows the models
to break down these complex problems
effectively and mimic also kind of
what we're doing in human reasoning.
And it also has implications also on the
way we design these models and also the
way the models interact with the hardware.
So it's kind of pushing for more hardware
software co design, um, in this context
where processing during inference,
I think another trend I see is the
hardware accessibility volunteer for all.
I think when we see the Llama3 series, which
illustrates new hardware ecosystems are
evolving for both high end research models,
but also for consumer grade applications.
So the Llama models, they release, you know,
multiple versions, the 400, the 8 and so on.
So that's also an important
trend that we're seeing.
So we can kind of bridge the gap between
high end These are data centers that allow
basically access to where you have access to
these high end computes and infrastructure,
which is not accessible to everything.
So pushing towards that
would be really important.
The other thing is the open
source and the enterprise synergy.
IBM released Granite 3, which I think
is a great step in the right direction,
which also highlights the importance of
open source AI and its ability to maximize
the performance for enterprise hardware.
And, but there are still
hardware design challenges.
For example, what we see with NVIDIA's,
uh, the Blackwell GPUs and the
issues that they have around thermal
management and server architectures.
So, um, these hardware's, you know, to scale
the need to meet demands for these next gen AI.
Power efficiency is becoming critical.
So, um, so I think if I were to sum up what's
going on around these trends, I think the
year 2024 showcased the importance of hardware
software co design and the industry's pivot
also towards specialized AI accelerators,
open source adoption, and real time compute.
Innovations are really very important, are
setting the stage for further breakthroughs.
Yeah, that's a great note to end on.
Well, that's all the time
that we have for hardware.
Uh, Khaoutar Volkmar, thanks for joining
us, uh, and for all your help in 2024, uh,
explaining the kind of world of hardware and,
uh, we'll have to have you back on in 2025.
Finally, to round out our picture of
2024, we need to talk about the product
releases that stunned us, amazed us
and gave us something to think about.
To help me do that are Kate Soule, Director
Technical Product Management for Granite, and
Kush Varshney, IBM Fellow on AI Governance.
Kate, maybe I'll turn it to you first.
Obviously, you know, the schedule was crazy
this year in terms of product releases.
It felt like every other
week there was something.
But I guess looking back on the last 12
months, I'm kind of curious, like, what did
you think was the biggest things, right?
The stories that will kind of look back on
2024 and be like, Yeah, this is the year that.
You know, that happens
as the director for technical
product management for granite.
I feel like I have to, uh, have to celebrate
what our team at IBM accomplished and
released for, for launching the granite 3.
0 model family, um, focused on right.
Apache two licensed models that are transparent,
uh, with kind of an ethical sourcing of the
data that went into them, uh, that we share
all the details about online in our report.
So really excited about being able to continue
that commitment to open source AI and being
able to create, you know, state of the art
language models and the two to 8 billion
parameter size that we can put out there under
permissible terms for our customers and for
the open source communities to, to leverage
more broadly, uh, looking outside of just IBM,
you know, I think the release of the GPT 4.
0 family of models and
product was really exciting.
I think it.
Launched a new wave of interest in how do we
continue to improve performance without just
spending more money on our training compute.
So I think that really is ushering in this next
wave that we're going to see in 2025 of how can
we spend more at inference time allowing models
and products that use these models to have more
advanced computations and inference calls that
get generated to improve performance beyond
just let's throw more money at the training.
Let's throw more data.
Let's scale, scale, scale.
So that's more broadly, uh, something
I was pretty excited to see.
Yeah, we should definitely talk
about both of those themes.
I mean, I think on the first one, you know,
2024 was really like the, the attack of the
open source, you know, it felt like for a
moment there, like all the closed source
models would really be winning the day.
And it's just like the explosion
of activity on open source has been
really, really exciting to see.
And then I think the second one as well
is kind of like, it's like the, the, you
know, play smarter, not harder, um, kind
of world where, you know, I think like
there's a bunch of new techniques that we're
seeing kind of play out in a lot of places.
Maybe Kush, maybe we'll
start with that first theme.
Um, you know, in the open source world, of
course, this is also the year of Llama 3.
Um, there's just been a lot
happening in open source land.
And, uh, curious as you look back, I mean, I
think on either of the themes that Kate, Pointed
out here, you know, either on the open source
side or in the kind of different methods for
doing AI If there's like things that you'd
want our listeners to remember from 2024.
Yeah, I mean, I think You're phrasing of it.
I mean Open source returns or the
return of whatever you want to call it.
Yeah, I mean, I think that's the The right
way to frame it, I think, uh, we're realizing,
I mean, when we talk to customers across
the board, um, that, uh, they were, I mean,
in 2023, it was all about kind of POCs and
this sort of thing, like getting people
excited within their own companies that
don't maybe generative AI has a role to play.
But then over time, they realized
that actually we need to worry about.
Um, uh, the copyrighted, uh, data, um, other
governance sort of issues, the cost, um,
just, uh, how to make these operational.
And, uh, I think, uh, watsonx, uh, the IBM
product, uh, kind of shined with, with that,
um, the, the granite models obviously as well.
So, um, How do we take, uh, the, the
science experiment that we had in 2023,
um, kind of was being used more, uh, this
year and now going into next year, it's all
about, uh, being as serious as possible.
I would say.
Yeah, for sure.
And I think now that you're on, uh, for
this segment, I mean, I think it's a good
time to ask too, obviously spend a lot of
time thinking about AI governance, right?
And there were a bunch of stories.
Yeah. in that vein, uh, this year.
I don't know if there's ones that
you'd want to call out for, for 2024.
Yeah, no, I mean, I think, uh, just
the fact that, uh, the whole AI
safety world, uh, convened, right?
I mean, uh, in, we had this, uh,
Korea summit, we had the summit in
San Francisco, um, uh, in November.
Um, and yeah, I mean, it's just, This is now the
topic, I think it's the thing that we need to
overcome, uh, because just having AI generative
AI out there without the safety guardrails and
without the governance, um, it's just dangerous.
Um, I think it's, uh, the promise of
the return on investment is only a
promise until you can overcome the hump
of, uh, the, the governance issues.
Yeah, for sure.
Do you have any predictions for where
we go in 2025 with all that?
I mean, um, Yeah.
You know, I think we're, I'm
detecting a theme here, which is 2024
almost like set up a lot of stuff.
2025, we're going to almost
see how it plays out.
I mean, both in open source and
in governance, it seems like.
Yeah, no, I think, uh, the prediction
is, uh, uh, I mean, the earlier segment
was about agentic AI in the show.
So I think that's gonna
really, um, explode as well.
And I think the governance, uh, There is
going to be what drives the governance
back down to, um, other use cases as
well, because when you have autonomous
agents, um, uh, then really the governance,
the trust is, uh, extremely important.
Uh, you have, I mean, no, very little
control over what these things might do.
Um, uh, the stuff that, uh, that Kate was
mentioning, the extra inference cycles
that you're going to see are going to be, I
think, mainly for the purpose of governance.
It's to make these things, um, kind
of self reflect a little bit, maybe
think twice about what answers they're
putting out there and so forth.
So you're going to have more tools
for governing the agents as well.
So the Granite Guardian 3.
1 release that just happened
actually has a function calling
hallucination detector in there.
So that's one of the things
that agents actually do, right?
As part of the LLM Uh, they actually will
call some other tools, some other agents,
some other function and if that itself is, uh,
hallucinated the parameters, the, um, the, the
type of the parameters, the function names,
all of these things can, uh, kind of go wrong.
So we have ways of, of
detecting, uh, issues there.
Kush, I'm, I'm curious, you
said the, the inference.
runtime is going to be used more almost
for kind of governance and self reflection.
But I think you had even shared a paper
recently about how there's also like, it
also opens this whole can of worms of other
risks and potential security issues, right?
When the models are running all these
loops offline and people are naturally
able to observe what's going in the
Yeah, I mean, uh, I think This whole, I
mean, self reflection, you can call it
metacognition, you can call it wisdom.
I mean, I think these are going to be things
that are going to be part of what happens.
But yeah, I mean, anytime you have extra stuff
happening, more loops, more opportunities,
more surface area for attacks, right?
So I think that is certainly
going to be part of it.
But I have hope that just like in other
sort of systems, you can have I mean, better
control when you can kind of have more
opportunity to kind of affect what happens.
Yeah, and I think that ends up being critical
and I think is also a pivot that I was
going to mention to kind of throw it back
to you, Kate, is, you know, if all of the,
you know, open source is just coming up
so quickly in 2024, Um, it feels like 2025
might finally be the year where it's like
we're at parity or even open source is like
going past closed source in some sense.
And I think, you know, this is happening
not just because the technology is getting
better, but also like Kush is saying, like,
you know, I think our ability to you know, have
components that ensure safety in deploying open
source models is also getting better, right?
In the past, it was like, well, we have to rely
on closed source because they really understand
how to do alignment and security and safety.
There's a lot of scare tactics out there.
That's right.
Yeah, exactly.
Only the big model providers have
the budget to be able to look at how
to do this safely or the expertise.
Um, that's right.
I, I, yeah, I think we're finally getting,
you know, chipping away at that enough.
We're seeing Meta, for example, doing a
phenomenal job releasing very large models.
with excellent safety alignment out there and
showing that you can do this out in the open.
It does not need to be, you know, inside
of, uh, behind a black curtain, so to speak.
Yeah, for sure.
Is that a prediction for 2025?
That we can, we can have
our cake and eat it too.
Like, like we can have it be
open and it can also be safe.
Absolutely.
Yeah.
That's exciting.
Um, do you have any open source
predictions going into the next 12 months?
Like where do we go from here?
Um, you know, I guess more
granite, even better granite.
I think the next year is really going to
be focused a little bit higher up, uh, the
stack on top of the models and co optimizing
models and the developer frameworks.
In which they're executed on.
So we saw the release of Llama stack, right?
When, uh, 2024, um, I think we're going to
see that wildly evolve, um, as it starts
to mature and other similar types of
capabilities and stacks being developed.
I think we've all also kind of accepted
like the open AI endpoint way of working
with models is, you know, the incumbent
operating of, of, uh, way to operate.
But.
There's probably other ways we can continue to
innovate and improve now that we've been around
the block a few times, so I think we're going
to start to see a lot of open source innovation
a little bit higher up the stack, particularly,
you know, from model providers who are
looking, how do we further improve performance?
It goes hand in hand.
If you're trying to optimize and run
innovate at the inference time, you
need a stack that can handle that.
And so, That's where I think a lot of
the development is going to happen.
Yeah, for sure.
Yeah, I feel like there's so much that we've
just taken as a given in some ways, just
because, like, that's where stuff got started.
But it's even easy to forget,
given that there's so much news.
Like, this is like all very fresh.
And like just a few years ago, it
was like basically non existent.
Um, so.
Yeah, I think one that I kind of put before
this group, particularly because we're
talking about product releases, you know,
I think this year I'm mixture of experts.
We've talked a lot about how like chat, right?
It's just like an interface that
we we started with just because
like chat GPT was so successful.
But there's kind of no reason why like
that has to be the way we interact
with these systems going forwards.
Um, I'm curious if either of you have kind of
predictions on like, even like the interface.
Like do we start interacting with these
systems in a way that's, that's pretty
different from like what we've gotten used to?
Yeah. I mean, I think, uh, the co creativity, co
creation is going to become a bigger thing.
So.
having multiple people.
I know there's been some canvas sort of
things that have come out this year as well,
but, uh, I think it's just going to grow.
And, um, uh, let me give a
brief shout out to my brother.
Um, he has a startup called KOCREE, K O
C R E E, and I just got to get that in.
Exactly, exactly.
And, um, uh, it's all about
kind of co creating music.
Uh, for people, um, kind of through with AI,
but also to help people, um, and society, um,
with their well being and, uh, and so forth.
Because when you create with others, um, it's
actually like a positive experience as well.
So, um, uh, so I think, you know, Just a
shift in focus a little bit maybe towards more
human flourishing human sort of well being
and kind of how to get people to really work
together to have kind of open endedness and
so forth might be something that emerges.
What, uh, maybe we're got a little
few minutes left on this segment.
Is there anything that
folks aren't talking about?
Like, I guess that's one thing is like, I
feel like, um, you know, and particularly
in AI, everyone is always excited about
like the latest model release or like the
latest, you know, um, Yeah, I'm always
kind of trying to see around corners.
I think with both of you were you're kind
of experts to think about this so deeply,
like what's under hyped, maybe like
underrated at the moment that really deserves
more attention going into the next year.
I think there's a going to
be a tremendous opportunity.
And I really hope this takes off around thinking
about modular components for building with LLMs.
So how do we, like there's work
going on, for example, on how do we
get to the point where you could.
fine tune a LoRa adapter, right?
Kind of a bucket of weights that you fine tune
for your task that sits on top of the model.
Right now they have to be tailored
for the exact model you're going to
deploy and you, a new version comes
out, you have to retune your model.
But how do we create versions of this?
For example, there's interesting research that
are universal that can be applied anywhere.
And then that creates some really nice,
like modular components that you could
ship or could have a catalog of and choose
from and provision live and swap in and
out again at inference time, you could
swap these types of things in and out.
I think there's also aspects like
we've all heard now of our, uh, seminal
mixture of experts architecture, right?
Um, so there, I think is going to be increasing
look at, can we make modular components
where you have modular experts that get
swapped in and out on the architecture side?
So I would love, and I think there's
some really interesting, um, research
going on at the ground level that could
support a focus around how do we make.
building and specializing
models more modular in 2025.
Yeah, that's super cool.
And I think doesn't get enough attention.
I mean, I think everybody's always
like the AI just it just doesn't
One big model that does everything.
Yeah Why do I have to choose and
that's once we have the big model
all our problems will be solved, right?
Bigger is better, right?
Yeah, for sure How about you Kush anything
underrated you'd point out to our listeners
before we close up on this segment?
Yeah, I think um, I mean the the middleware
for agents I think would be one thing as well.
I mean, building on what Kate
just said about the modularity.
So, uh, even having, uh, different agents
in a multi agent system, how you kind of
register them, orchestrate them and so forth.
So, um, from IBM research, uh,
we have this, uh, B framework.
So B as in the, the thing buzzing around in my
ear, um, uh, and, uh, that, Uh, is out there.
There's, um, other startups as well.
Um, so, uh, some former IBM researchers,
uh, have this company called
Emergence AI, and, um, they have one.
Um, uh, there's others out there as well.
So, um, I mean, I think that's gonna
pick up, um, and it, I mean, again,
relates to what Kate was saying.
I mean, connecting more between the
development environment and the models
kind of, uh, linking that, uh, much closer.
So I think, uh, Uh, once we are at a point where
the models are all kind of good enough, then
it's a question of, um, how do we use them?
How do we make productive use?
Um, and, uh, how do we develop them better?
So, yeah.
Yeah, for sure.
Definitely keep an eye on that space.
Well, Kate, Kush, thanks for
joining us, uh, on this segment.
Appreciate you helping us to
navigate 2024 in product releases,
but also 2025 in product releases.
And we will See you in the new year.
Well, that's everything we have
time for on our episode today.
So much happened in 2024.
And there's basically no way
we could fit it into one show.
But I want to thank all of our
panelists for helping us try.
And to all the panelists that we've been lucky
enough to have on Mixture of Experts in 2024.
Each week, we get to nerd out with some
of the smartest people in the business.
And it's a pleasure to be able to talk
with them to better understand this
crazy world of artificial intelligence.
And thanks to you for joining us.
If you enjoyed what you heard, you
can get us on Apple Podcasts, Spotify,
and podcast platforms everywhere.
Here's to what was a great 2024, and here's
looking forward to an incredible 2025.