Year-End AI Model Launches
Key Points
- Mistral 3 is a straightforward dense‑attention transformer without exotic attention tricks, yet it delivers strong performance, showing that scaling plain‑vanilla models can still be effective.
- At Amazon’s Re:Invent conference the company launched three autonomous AI agents capable of handling coding, security, and operations tasks for extended periods without human intervention.
- IBM reported its AI coding assistant “Bob” boosted developer productivity by 45%, and Salesforce data revealed AI‑driven agents generated roughly $14.2 billion in Black Friday sales.
- This week saw a rapid succession of major model releases—Claude Opus 4.5, Mistral 3, and DeepSeek 3.2—highlighting intensified competition among AI labs as the year winds down.
- IBM staff nickname the end‑of‑year slowdown “funsember,” using the quieter period to experiment with and evaluate new AI technologies.
Sections
- Mistral Model Review on MoE - The panel examines Mistral's plain‑vanilla transformer design, then previews the week’s AI headlines—including Amazon’s new autonomous agents and IBM’s coding assistant—on the Mixture of Experts show.
- Exploring Emerging Efficient AI Models - The speaker highlights the surge of new AI models and observes how labs such as DeepSeek and Mistral are advancing efficiency with novel attention mechanisms, despite the challenge of fine‑tuning amid the abundant options.
- Open‑Source Labs vs Commoditized AI - The speakers debate whether open‑source AI projects can stay distinct amid inevitable model commoditization, noting that openness itself and strategic focus—rather than massive compute—allow them to achieve state‑of‑the‑art performance.
- Model Ensemble Optimization Strategy - The speakers discuss using routers to dynamically select among multiple AI models—like Opus, Claude, and Mistral—to optimize performance across varied enterprise and vision tasks.
- Using AI as Collaborative Code Partner - The speaker describes treating AI models like a responsive, corrective sounding board for technical tasks, valuing their ability to refine suggestions and collaborate rather than engage in casual conversation.
- Scaling Laws and Compute Access - The participants debate whether AI scaling laws remain applicable for inference, arguing that only entities with massive compute resources can leverage them while others encounter diminishing returns.
- Beyond Scaling: Quality‑Driven AI Progress - The speaker argues that recent AI advances arise mainly from training and algorithmic refinements—a “quality improvement law”—rather than larger model sizes, emphasizing how costly, months‑long iteration cycles make pure scaling impractical.
- Amazon Blocks ChatGPT Shopping Agent - Amazon announced it will prevent ChatGPT’s shopping‑research feature from accessing its product listings and pricing, limiting the AI’s ability to browse and recommend items on the platform.
- Monetization Threatens Open AI Agents - The speakers warn that emerging paywalls, such as a proposed Cloudflare “toll booth,” could restrict free web access and jeopardize the practical development of AI agents by shifting the focus from open experimentation to revenue-driven incentives.
- Personal AI Agents Replace Apps - The speaker outlines how conventional apps are being supplanted by personal AI agents that delegate tasks to approved third‑party agents, highlighting the platform‑versus‑AI competition and the emerging shift of SEO toward AI‑driven assistance.
Full Transcript
# Year-End AI Model Launches **Source:** [https://www.youtube.com/watch?v=_lZgapJzFho](https://www.youtube.com/watch?v=_lZgapJzFho) **Duration:** 00:35:39 ## Summary - Mistral 3 is a straightforward dense‑attention transformer without exotic attention tricks, yet it delivers strong performance, showing that scaling plain‑vanilla models can still be effective. - At Amazon’s Re:Invent conference the company launched three autonomous AI agents capable of handling coding, security, and operations tasks for extended periods without human intervention. - IBM reported its AI coding assistant “Bob” boosted developer productivity by 45%, and Salesforce data revealed AI‑driven agents generated roughly $14.2 billion in Black Friday sales. - This week saw a rapid succession of major model releases—Claude Opus 4.5, Mistral 3, and DeepSeek 3.2—highlighting intensified competition among AI labs as the year winds down. - IBM staff nickname the end‑of‑year slowdown “funsember,” using the quieter period to experiment with and evaluate new AI technologies. ## Sections - [00:00:00](https://www.youtube.com/watch?v=_lZgapJzFho&t=0s) **Mistral Model Review on MoE** - The panel examines Mistral's plain‑vanilla transformer design, then previews the week’s AI headlines—including Amazon’s new autonomous agents and IBM’s coding assistant—on the Mixture of Experts show. - [00:03:11](https://www.youtube.com/watch?v=_lZgapJzFho&t=191s) **Exploring Emerging Efficient AI Models** - The speaker highlights the surge of new AI models and observes how labs such as DeepSeek and Mistral are advancing efficiency with novel attention mechanisms, despite the challenge of fine‑tuning amid the abundant options. - [00:06:40](https://www.youtube.com/watch?v=_lZgapJzFho&t=400s) **Open‑Source Labs vs Commoditized AI** - The speakers debate whether open‑source AI projects can stay distinct amid inevitable model commoditization, noting that openness itself and strategic focus—rather than massive compute—allow them to achieve state‑of‑the‑art performance. - [00:10:20](https://www.youtube.com/watch?v=_lZgapJzFho&t=620s) **Model Ensemble Optimization Strategy** - The speakers discuss using routers to dynamically select among multiple AI models—like Opus, Claude, and Mistral—to optimize performance across varied enterprise and vision tasks. - [00:14:02](https://www.youtube.com/watch?v=_lZgapJzFho&t=842s) **Using AI as Collaborative Code Partner** - The speaker describes treating AI models like a responsive, corrective sounding board for technical tasks, valuing their ability to refine suggestions and collaborate rather than engage in casual conversation. - [00:18:31](https://www.youtube.com/watch?v=_lZgapJzFho&t=1111s) **Scaling Laws and Compute Access** - The participants debate whether AI scaling laws remain applicable for inference, arguing that only entities with massive compute resources can leverage them while others encounter diminishing returns. - [00:23:21](https://www.youtube.com/watch?v=_lZgapJzFho&t=1401s) **Beyond Scaling: Quality‑Driven AI Progress** - The speaker argues that recent AI advances arise mainly from training and algorithmic refinements—a “quality improvement law”—rather than larger model sizes, emphasizing how costly, months‑long iteration cycles make pure scaling impractical. - [00:26:31](https://www.youtube.com/watch?v=_lZgapJzFho&t=1591s) **Amazon Blocks ChatGPT Shopping Agent** - Amazon announced it will prevent ChatGPT’s shopping‑research feature from accessing its product listings and pricing, limiting the AI’s ability to browse and recommend items on the platform. - [00:29:41](https://www.youtube.com/watch?v=_lZgapJzFho&t=1781s) **Monetization Threatens Open AI Agents** - The speakers warn that emerging paywalls, such as a proposed Cloudflare “toll booth,” could restrict free web access and jeopardize the practical development of AI agents by shifting the focus from open experimentation to revenue-driven incentives. - [00:33:16](https://www.youtube.com/watch?v=_lZgapJzFho&t=1996s) **Personal AI Agents Replace Apps** - The speaker outlines how conventional apps are being supplanted by personal AI agents that delegate tasks to approved third‑party agents, highlighting the platform‑versus‑AI competition and the emerging shift of SEO toward AI‑driven assistance. ## Full Transcript
On the Mistral side, uh, you know, I was
actually kind of surprised, but I had
mixed feelings about it's kind of a
plain vanilla transformer, right? Like
there's a few little tweaks in there,
but there's no fancy attention
mechanisms. There's no attempt at linear
attention scaling. It's just a a big old
dense attention model. Um, [music] and
it's really good. All that and more on
today's Mixture of Experts. [snorts]
I'm Tim Huang and welcome to Mixture of
Experts. Each week, Moe brings together
a panel of the smartest minds in
technology to distill down what's
important in the latest news in
artificial intelligence. Joining us
today are three incredible panelists.
We've got Aaron Botman, IBM fellow
master inventor, Abraham Daniels, senior
technical product manager for Granite,
and Gabe Goodart, chief architect, AI
open innovation. Uh, welcome back to the
show [music] all three of you. Lots to
talk about today. We've got a welter of
new model releases. We'll talk a little
bit about the future of the scaling laws
and uh dust up that's happening between
Amazon and Chat GBT. But first, we've
got Eiley with the news.
[music]
Hey everyone, I'm Eiley McConn, a tech
news writer for IBM Think. [music] Here
are this week's AI headlines. At
Amazon's [snorts]
annual reinvent conference, the tech
giant [music] launched three new agents
that can handle coding, security, and
operations independently for hours or
days at a time. IBM has shared early
results that its AI coding assistant
named Bob has helped IBM developers
improve their productivity by 45%.
The [snorts] numbers are in. Globally,
AI and agents influenced 14.2 2 billion
in sales on Black Friday, according to
software firm Salesforce. AI for dinner.
Posasha is a new private robot chef that
can prepare complex multi-step dishes.
[music]
Hello, Smart Kitchen. [snorts] For more,
subscribe to the Think newsletter linked
in the show notes. [music] And now back
to the episode.
I was looking at the calendar and
there's really not a whole many few days
before the end of the year. Uh but
basically the AI news hits keep coming.
Um and literally in the last few weeks
we have had not one not two but three
fairly major model launches um by uh
competing labs in the space. So uh
Claude Opus 4.5 is out, MRL 3 is out and
also Deepseek 3.2 is out. Um and so Gabe
I want to kick it over to you first.
Obviously with Mistl 3 and Deepseek 3.2
to open as a big theme of this round of
launches and so kind of want you to just
talk a little bit about what you're
seeing with Mistl 3 and DeepC 3.2 um and
I guess what listeners should take away
from it.
>> Yeah. Uh awesome. So, as you said, it's
the end of the year. Everyone's cramming
to have that last bit of [laughter] news
for people to go.
>> Uh well, I don't know about you guys,
but I've always referred to it as
funsember here at at IBM because it's
when things start to slow down and you
actually get some time to play with
stuff. It's actually a great time to
launch an experimental new model for
people to then play with over the
holidays and a little bit of downtime.
So, um I think it's it's, you know, it's
kind of fun and clever. People are going
to have lots of new toys uh on their
either laptops or servers depending on
which size of model you pick and choose.
So, um, you know, when I was playing
with these, um, I think in some ways the
fact that they're all hitting thick and
fast right now, um, speaks to the fact
that we are really hitting this just
wealth of riches in the model space.
Like, they're all such great models. Um,
you know, same with the ones we talked
about the last time we were on
[laughter] talking Gemini. Yeah, there
are so many good models out there, which
is awesome. And it makes it a little
harder to have sort of a fine-tuned
point on any of these. But what I
definitely noticed that I thought was
interesting here um is that I think
these respective labs at least on the
open side are really leaning into their
strengths. So I noticed that you know
with the Deep Seek uh release they have
yet another novel attention mechanism
that aims to further uh dig into and you
know improve on I've got a giant model
but I can still run it really
efficiently. Right? That was one of the
major breakthroughs with uh you know R1
and the V3 series. Um and you're seeing
them pushing yet another one here with
the their sparse attention uh mechanism
here with uh 3.2. So, um, really cool to
see that they're continuing to iterate
in that space. Um, and, um, on the Mistl
side, uh, you know, I was actually kind
of surprised, but I had mixed feelings
about it's kind of a plain vanilla
transformer, right? Like there's a few
little tweaks in there, but there's no
fancy attention mechanisms. There's no
attempt at linear attention scaling.
It's just a a big old dense attention
model. Um, and it's really good. Uh I
think what their innovation was is that
every one of their models up and down
the line had vision capabilities which
is pretty cool to just see that not as
like an extra but just as breadandbut.
Um these are multimodal models out of
the box. They work great on text but and
the vision is just more rather than a
lot of times you think about especially
at the smaller end of the scale you get
models where you have to kind of make a
trade-off between quality in
multimodality and and like pure quality
in text. And it seems like they've
figured out a pretty clever way to
actually just boost it really well. So,
you know, I did a little dabbling with
all of these yesterday. I tried to
experiment with a pretty hard uh coding
problem I'm trying to tackle right now
around metal kernel optimization for
Llama CPP, which frankly I don't know
much about. [laughter] And I threw it at
>> That's very specific.
>> Yeah, it is very specific. I threw it at
all three of the big size models. Um,
and you know, each one of them gave me
some really, you know, interesting tips
uh on optimizations. So again, like I I
don't have any clear way of saying, "Ah,
this one blew me away and this one, you
know, really fell flat." They all did
great. They're all great models at the
top size. What I did think was really
cool is that I was able to pull mineral
3B on my, you know, my dev box and just
crank away on multimodality workflows
through open web UI, and that was really
fun, too. So, um, I love seeing both
ends of the spectrum here, especially
from the Mistral release. you know, I I
kind of wish I could get my hands more
tightly around the Deep Seek release so
I could play with it more. Um, but uh,
you know, I'm I'm a love to tinker with
it myself kind of person and uh, at the
top end of the scale, they're still all
delivering great quality. So,
>> yeah, for sure. And Abraham, maybe I'll
kick it over to you. I mean, I want to
build on that comment that Gabe had,
which is uh, there's so many models.
They're all great. Uh, and as a result,
it's almost hard to make distinctions
between them because it's it's all
really good basically. Um, and there's a
there's a a famous kind of memo from the
early days of this competition, and by
that I mean, I don't know, a few years
ago that leaked out of Google that said,
look, at the end of the day, there is no
moat to these models cuz everything's
going to be commodified. Everything's
kind of going to be great and all free
and available in a certain sense. Do you
have a sense of how open- source
projects, labs need to be able to stay
differentiated in this market or is this
just kind of inevitable that like
everything will be great and then
everybody will kind of be very similar
in the models that they launch?
>> It's a good question. So I think open
source labs differentiate from closed
source by you know being open source to
be honest. I think that really is the
remote there. um in terms of uh like you
know performance and capabilities I
think with the first Deep Seek R1 you
really saw that you don't necessarily
need to be one of these you know highly
funded uh you know hundreds of thousands
of GPUs trained um you know
organizations to be able to to to really
release something that's
state-of-the-art. Um, I think what
you're starting to what I saw with Deep
Seek and with Claude is kind of leaning
into what you do really well kind of as
Gabe mentioned, um, in terms of, you
know, Claude Opus really doubling down
on, you know, software engineering as
their, you know, primary target and and
really ensuring that they maintain kind
of like a a chokeold on on that
particular use case or those particular
capabilities. Uh what I saw what I kind
of thought was cool for Deepseek was um
their kind of reasoning first for agents
for tool calling. Um so really kind of I
think you're starting to see a lot of
model releases uh really focus on being
hyper performant when it comes to tool
calling or any sort of agentic workflows
as they start to see like that's the
next frontier in terms of how LLMs are
going to be used in place of standalone.
Um I think what I really liked about
this like you know last number of model
releases is Vistil is going back into
open source and not with their you know
bespoke research licenses really going
back to the Apache 2.0 roots of the
early open model kind of efforts. Um,
but yeah, so in terms of differentiating
between open source and closed source, I
think, you know, right now they're
they're they're kind of neck and neck in
most cases. And I think that, you know,
there's a there's a there's a comment
about like, you know, what is good
enough in terms of what's out there. And
I think you, you know, it really depends
on um, you know, what your business case
is and what you're actually trying to
solve. Abe, I think you put an
interesting point on that one actually
with the Deepseek reasoning and tool
calling is that I I really think each
lab is going to choose their specialty,
right? Like Deepseek is clearly trying
to be the reasoning lab. So they're
going to try and make their reasoning
the best no matter what else you're
using it for. Um, you know, uh,
Anthropic is clearly trying to be the
best developer model shop. Uh, and
they're really going to lean in heavily
on that one. You know, I so I think that
is what you'll probably end up seeing is
that individual labs start to
differentiate by the
I don't want to even call it task
because task is hard to classify in an
LLM space. Like you could call tool
calling a task, but it's going to be
much more around like domain. Like what
domain is the model really optimized to
work well in? Um and of course you're
still going to have the frontier labs
that are trying to just be the best at
all domains, right? Um, and you can do
that by just pushing the parameter count
up. Uh, and the the training data
improvements. We'll get to that in the
next segment, obviously. Uh, but I
think, you know, especially at the
smaller model sizes and the open source
shops, you're going to probably see the
ones that succeed um, choosing their
specialties and trying to fight a niche
market in a specific domain.
>> Yeah, I think that differentiation is
going to be really interesting to see
because it's like almost in this game of
musical chairs, it's like how many
chairs are there for for you to
specialize in this space? Um Aaron, I
want to bring you in. Um you know,
obviously with Gabe and Abe on the on
the line, uh they're very uh they're
very open open biased, but do you want
to talk a little bit? Have you played
with Opus 4.5? Curious about your
impressions on on Claude and what
Anthropic is doing with the new model?
>> Yeah, I mean just just to just to sort
of piggyback off, you know, your comment
about the musical chairs. I mean, the
good news is is that m the music never
stops here, you know, because we have
lots of models coming in very
performant, right? And what I what I
what I personally like to do is to
ensemble different models together, you
know, because because it's almost like
this optimization problem, right? Where
where you have lots of different models,
right? And you need to um optimize
against an objective function that
you're trying to achieve, right? But
that object objective function also is
baked in there. What models do you
select? But you can have a router that
in turn looks at the ideal use cases for
each of these models, you know, such as
the Mistral 3. it seemed to be you know
you know really good at these enterprise
rag systems right and uh wanting to um
have these different chat bots or
co-pilots and um some some of it for a
vision whereas if you look at DeepS you
know 3.2 it's very great you know at
math and and and different types of you
know codegen uh but I also noticed that
the opus uh 4.5 um it seemed to extend
this notion of a digital worker right
where you could maybe even replace or
augment you know humans uh where you
know you will actually have an engineer
you know a virtual engineer that would
read the entire 200page spec of a
language which we generally wouldn't do
right so I mean that that's that's quite
quite helpful there you know um and and
then um as as you begin to create a
router that then in turn uh pushes out
you know your uh prompts right um and
then it pulls in the data which each of
the models have have access to you then
in turn have to consider the different
topologies and I uh Gabe was a was a
mentioning right whether it's going to
be you know Mistl's um open weight
mixture of experts with these different
types of attention transformers that
they have um deepse sparse attention
um and then um then then collage where
where it has you know this agentic
enhancements and memory state layers but
but what I think is going to happen
right is with you know as these models
come in we're going to have more hybrid
architectures right so I think the day
of just having a transformer is going to
be no longer where you're going to see
statebased models being put in and mixed
together. Uh which which is going to
make it very exciting, right? Um just
just that the different emergent
behaviors that are going to happen
especially when socially the these
models began to interact and you know
play together.
>> Maybe a final question here and then
I'll move us on to our next topic. Um
you know at least on the topic of Opus
4.5 which is the one out of these three
models I've played the most with. um
they're really getting quite good at
personality it feels like. Uh my friend
had a really funny incident where he was
having a conversation with Opus 4.5 and
then you know they're talking about one
topic and they moved on to another topic
and then like many turns later Opus says
like oh you know I've been thinking
about that thing that we're talking
about earlier and I think you're right
on that issue. It was like kind of this
like very weird moment where it's like
the model kind of does this call back in
a very kind of like natural way that you
might have you know in having a
conversation with someone. Um, I'm kind
of curious if you any of you have kind
of like uh your sort of like vibe and
flavor check across these models either
with the 4.5 or otherwise. Um, it does
feel like the 4.5 kind of nailed
something with the voice. Um, but uh
curious about what you think about uh
that if any of you have kind of played
with it and what do you think?
>> It's really interesting to hear you say
that cuz like frankly that's exactly the
opposite of how I use models, right?
Like I I don't think I've ever gone four
or five. It's exactly I I use them as a
functional sounding board. the way I
might have a colleague where I used to
stand up and walk over to their desk and
be like, "Hey, let me vomit words at
you." And then like you'll you'll ping a
few back at me and all of a sudden like
the the right answer will emerge.
>> Be like, "Okay, thank you. Goodbye.
>> Thank [laughter] you. Goodbye." Exactly.
Uh like wipe your wipe your memory,
start over again. Um no. Uh so my vibe
check is much more about sort of the
responsiveness and the collaboriveness
in those functional experiences. Um, you
know, I models that are open to when I
point out something that they got wrong,
refining their suggestion and or taking
that like that is my vibe check right
now because a lot of I use them almost
exclusively for technical related topics
whether it's checking my own code,
helping to understand a programming
language or a accelerator paradigm that
I'm not familiar with. Um, but I do have
deep expertise in some of the the
aspects around what I'm doing. And so a
lot of times I'll catch something that
they did wrong, but they'll have some
insight that I don't. And the ability to
collaborate back and forth is what
really like resonates as quality for me.
Um, I'm not so much in the chatty mode
usually. Um, but yeah, I think everyone
probably has their own vibe check
depending on how they like to use
models.
>> Yeah, totally. That goes to the chair's
question, right? Is like these use cases
are really so varied that like that's
going to be the specialization. So very
interesting.
>> [music]
>> I'm going to move us on to our next
topic which is related to some of the
stuff that we're talking about here. Um,
one of the folks connected to flagged
this interesting blog post uh from a VC
for from theory ventures uh called Tomas
Tongas. Um, and he kind of pulls
together a couple different threads, but
sort of the core of his article, uh,
this blog post that he did was sort of
arguing that Gemini 3, uh, kind of
demonstrates that maybe the scaling laws
are still pretty good, like that
essentially with like a ton of compute
using the methods we know, we can still
see some like major capability
improvements. And, you know, I know last
week we we did talk a little bit about
Gemini 3, but it's sort of interesting
kind of putting it in that context. We
talked a little bit about sort of what
you can use Gemini 3 for. This is maybe
a little bit of like how it is maybe
informative about how the meta
competition around these models is
evolving. Abraham, maybe I'll kick it
over to you. Is is do you buy the
thesis? Like scaling laws maybe still
better off than we thought. Um we just
got to throw more compute at it.
[laughter]
>> Yeah, I'm sure Nvidia would love that.
Um
so I I think that that Google's a little
bit different in this case. Um
specifically because they've got full
integration of hardware and software as
it uses its TPUs,
>> right? They have like the most computers
that ever computered kind of.
>> Exactly. You know, so it's much
different than a uh a shop using GPUs
that might not have the same integration
um across the whole stack. So I I think
the you know TPUs and Google have a
little bit of unparalleled advantage
when it comes to being able to squeeze
as much out of the com you know
processing units. So I I I think if we
were to see this type of you know
behavior if you will with a different
model um for instance you know if if the
new claude opus 4.5 or DC were to be
able to showcase you know some of the
you know contradictions to the scaling
laws I think we just need a different um
or I guess another proof point if you
will. Uh and in in reading some of the
material from the Gemini, they did some
other things or at least you know the
they alluded to some other concepts or
or updates in their strategy in terms of
how they build their model that could
lean into it. I I read a comment about
you know context engineering in place of
prompt engineering where you know the
thought process was maybe behind the
model generation it grabs a bunch of
large relevant context in the background
so it can you know provide a little bit
more of a thought experiment before it
generates the results. Um, so yeah, I
guess to to kind of close the button on
on Gemini, I'd love to see maybe a diff
another proof point that is a little bit
more focused on GPU use as opposed to
TPU. And then in terms of the other part
of that um article that said, you know,
subsequent to Google's uh comment, you
know, Nvidia's growth is still, you
know, their last Q3 release showed that
they're still
uh have a massive output of GPU. Most
GPU sales aren't for pre-training.
They're for inference. So, I don't
necessarily think that's like the right.
>> Yeah, that one I felt was like a little
bit of a weak argument is basically like
just buying compute doesn't mean that
the scaling law still applies.
>> That's that's exactly it.
>> Yeah, for sure. Um, William Gibson has
this famous quote which is like the
future is here, it's just not widely
distributed. And I guess this kind of
makes me think about like the scaling
laws are here. this just not widely
distributed in the sense that basically
and Aaron I'm curious about your comment
on this is like we may live in a future
where sure I guess maybe the scaling
laws kind of still exist but the amount
of infrastructure you need to pull it
off is basically only available to who
maybe only Google like it's just like
the scale is just so huge for everyone
else they're kind of will will not be a
scaling law do you think that's kind of
the case is that basically like it will
because we're sitting this kind of
period of diminishing returns you can
only keep it going if you're like at
like the 99.999th
percentile of access to compute and for
everybody else we're just we're just in
a world where scaling laws kind of don't
exist anymore.
>> Yeah. I mean I mean I think that the
scaling curve you know it's going to be
like this little stepwise you know
function you know so you have all these
little scurves that happen you know as
we get new technology breakthroughs
right so it's multi-dimensional right um
and I think there's going to be new
dimensions as we progress through a time
right so like as new topologies come or
new algorithms are designed I think
that's going to help the smaller players
to be more competitive you know you
don't you potentially don't need these
you know huge data centers with you
these large amounts of GPUs
uh because they might be able to improve
performance without needing to change
you know you know any of that. Um but
but I also wanted to to make the point
too is that um you you one of my litmus
tests around the scaling law is I look
at you know us humans right and you know
you know we have this biological scaling
law you know our brains they haven't
changed much you know over the course of
centuries right and but our tools and
our technology have so the more data and
knowledge um you know that we get we
still have that same topology right uh
we're just able to specialize and to
have different types of training right
which is less
um um consuming, right? Um I mean I mean
I know it's not a perfect analogy
because you know AI gets better maybe
with the more GPUs you add but humans
get worse with the more coffee you add,
right? But [laughter]
but I mean even even so it kind of
helps.
>> Yeah. Totally. Yeah. I think I mean the
the biological metaphor is good too
because it's like it's also after all
this evolution we don't have infinitely
large brains, right? And so there's
almost kind of a view which is yeah we
actually found an equilibrium like
actually you only need so much
intelligence to get through most of the
problems you're going to confront. Yeah,
it's actually I think it's a really
interesting metaphor,
>> right? And and I mean I mean if you look
at Gemini 3, you know, it kept roughly
the same number of parameters, you know,
you roughly one trillion, you know, with
respect to Gemini 1.5, right? So it's
almost like the human brain where it's,
you know, the same sort of topology. I
mean, I'm sure that they've, you know,
changed different, you know, different
types of activation functions and so on,
you know, but size-wise, right? It's
it's somewhat static, right? So, so it
it is a bit a bit interesting, but keep
in mind that step-wise function that's
going to, you know, happen much like
what's happened, you know, with compute,
you know, with um quantum, right? Um so,
uh I I do think that, you know, circling
back to your original question that the
small players will still, you know, you
know, have a big say. Gabe, I think that
there's a final question here which is
the obvious comparison to these scaling
laws is is Moors law, right, which used
to be this industry organizing law that
said processing power is just going to
increase. And uh you know I think the
right observation about Moors law is not
like it's some you know mystical law of
nature. It's because the whole industry
was like we got to keep Mors law going
and like it was only kept going because
every few months we were able to get
another innovation to keep it going
again. Um, and so, you know, there's a
way of looking at the scaling laws,
which is it's not necessarily again some
like inevitable law of nature like
gravity or something like that. It's
more of kind of like a shelling point
that causes the industry to kind of
focus on certain types of things. And
so, yeah, curious about what you what
you make of that. Is that the right
interpretation of what we're seeing
here? Yeah, I think it is a a very
interesting, you know, it's kind of
progress for the sake of progress, but
there's probably some genuine utility
coming out of that progress, but it's
definitely sort of the competition is
feeding the progress more so than the
actual need for the progress. Um, but
you know, Moors law is an interesting
one because Moors law is actually like
pretty explicitly measuring like
floatingoint operations per second,
right? like it's it's a very specific
linear well uh unid-dimensional metric,
right? Um and there are all sorts of
clever ways to you know improve that by
not just making one faster chip. You can
do a bunch of things in parallel and do
some clever metric, you know, whatever.
The thing about this hypothetical
scaling law in AI is it just even
framing it as a scaling law to me seems
like the wrong point. Like Erin, you
pointed out that Gemini didn't change
the number of parameters. So, we're not
scaling size.
They don't really tell us whether they
scaled the data inputs, but they
probably scaled to more data.
Do you consider algorithmic improvements
scaling? Maybe. Like I my guess based on
that, you know, tiny tweet is that most
of the improvements were actually how
they trained, not, you know, just
bigger, right? And I so I think the the
framing of this as a scaling law is kind
of a bit of a misnomer. Um I think it's
a quality improvement law. And in some
ways it's kind of a no-brainer that we
are nowhere close to the wall on that
because when you have an iteration cycle
that costs millions of dollars and takes
months. Yeah. It's going to be really
hard to you know actually move that
ship, right? Like, you know, as a
developer, I want something that takes,
you know, fractions of a second and say,
"Oh, that didn't work. Try something
else. That didn't work. Try something
else." As a model developer, you have to
press go and wait for months and burn
millions of dollars in the process,
right? So, those experiments are
expensive. Iterating in the algorithmic
space is really hard for these models.
So in some ways it's not at all
surprising that there's a lot of
probably still lowhanging fruit to be to
be grabbed by doing better things in
that training space. Curating your data
better, figuring out your actual
training loops, figuring out your
mixture of synthetic data, uh all of the
above, right? There's there's so many
tricks you can play to actually steer
the training process of these models.
And you know, I I would bet that if we
were to somehow, you know, do some back
of the napkin math on how many different
theoretical poss out of the theoretical
possible ways you could tweak and tune
this hyperparameter space for training
is, I bet you we've explored a tiny
fraction of the hyperparameter space for
training large language models just
purely based on the speed at which they
can they can operate. So I think the one
thing that is interesting in this
scaling law discussion is that as
hardware actually gets faster at doing
this, the ability to experiment and try
new algorithms gets greater. And so that
may actually be the real point of
scaling where uh we can start just
exploring that space of hyperparameters
in the training space faster and
therefore get to better quality outputs
uh more quickly.
>> Yeah, that's a cool interpretation. It's
like basically that like it's not
actually about the hardware in a certain
sense like
>> yeah the hardware probably has a role to
to speed up that iteration cycle but
it's not just more hardware more
hardware more hardware.
>> Yeah that's right. It's really an it's a
really a scaling experimentation law
right basically like it's not just add
more compute it's add more compute so a
bunch of folks can experiment which
causes algorithmic improvements which is
really where the results are coming
from. That's that's really interesting.
[music]
All
right, I'm going to move us on to our
last topic of the day, a business story.
Uh, a very interesting one. Um, you
know, throughout the entire year we've
been talking about agents. Um, ChatGpt
launched, you know, one of the most
obvious kind of agentic features, which
is, um, something called shopping
research. So, the idea is you're going
to use ChatgP to go do your holiday
shopping. ChatGpt will go out to the
world and find products that match the
kinds of queries that you're looking
for. Um, and when you think shopping on
the internet, you of course think
Amazon. uh the place for which you buy a
lot of things. Um the news came out uh I
think just earlier this week that Amazon
will be blocking the chat GPT shopping
research agent from looking at product
detail customer data deals on Amazon. Um
so super interesting development right
because instantly that makes a product
like or feature like shopping research
maybe a lot less effective right in so
far as Amazon really is the
infrastructure for all sorts of shopping
online. Aaron, I'll kick it over to you
for maybe the obvious question, but I
think it's worth making it explicit. Why
is Amazon blocking uh ChatGpt?
>> Yeah, I mean, so, you know, just to
start out with, it's it's as if like
Amazon told ChatGpt Shopppeebot to go
window shopping, but yet the door's
locked. You know, it can't go in and it
can't, you know, look at the data and
understand you sort of, you know, the
product listing or prices or or any of
that. And and and it seems as though
that that that Amazon might have done
this for a couple reasons. you know, one
of them could be to protect their
e-commerce data, right? So that third
party tools can't just directly access
that um right so so it keeps so they
keep control of their shopping funnel,
right? And then and and that door is
locked so nobody else can go in and it
protects their business model, right?
For ads, commissions, first party
traffic and and so on. But I think a
subtlety uh or maybe not not so subtle,
right, is that is that Amazon is also
working on, you know, their own AIdriven
shopping services, right? They have
Alexa plus right that's coming they have
Rufus right so th those two elements I
think you know they're trying to keep
their own ecosystem but but I think what
this ultimately means is that we have
these turf wars right that are starting
right right where we have this open
shopping AI verse closed retail empires
right right that we have going and and
and what what this could mean is that
maybe and it'd be pretty neat if this
happened but these smaller retails
retailers could band together so they
could collectively compete against
Amazon Amazon, you know, so so there's
thousands and thousands of mom and pop
shops, you know, that now could compete
against the big elephant, you know, and
and so so we'll just just have have to
see. Um, but I but I but I am curious
to, you know, if Amazon is now being
pushed through competition to double
down on Rufus, right, and to really
invest more in Alexa Plus and what is
that going to mean to us as we begin to
shop for the holidays, right? I I I I
hope for one I could save, you know,
right, you know, some some money, you
know, with these tools and find the best
products, you know, but uh but yeah.
Yeah, we'll have to wait and see. But
it's uh um exciting news, I think. Uh
overall,
>> yeah, the parallel story which I was
thinking a little bit about is, you
know, a few months back Cloudflare,
right, said we're going to start
blocking um uh AI agents. Uh it's going
to be by default you can't get through.
And I think what they said was, "Look,
we're standing up for all these websites
and eventually we're going to create a
little bit of a toll booth. So if you
want to access the data on a website,
you'll have to come through Cloudflare
and we're going to we're going to make
sure that they pay." Um, I guess Gabe,
you know, there's been a lot of talk on
whether or not agents are technically
possible. It kind of feels like there's
a big question on whether or not they're
even possible as a matter of business
incentive. um you know relies on an
internet where you can just access
information, you can just access
different platforms, but it feels like
the walls are going up everywhere. Feels
like that could actually really stifle
the whole dream of agents really even
being a practicality.
>> Yeah. You know, I I think you are
definitely right that there's we're
we're hitting a a point in this AI
timeline where all of a sudden um making
money is going to really matter. Uh and
that kind of stinks as a technologist,
right? Like it's been really fun to just
have this ride where all these big shops
are just putting out great technology
and thus far haven't been really tying
that to revenue goals specifically. Um,
you know, I can say one of the things
that I turn to chat GPT for is questions
that I don't want to later show up in my
Google newsfeed or my Google ads, right?
Like, uh, you know, I I I love that in
many cases, these AI technologies have
not yet been linked into the money
machine. And it's just been a matter of
time before that happens. And so,
clearly, this is one very big step in
that direction, right? Um, and the thing
that I think is interesting is to
speculate out a little further and think
about this almost like um where the
browser wars uh hit and the inevitable
antitrust lawsuits that came around
around browser, you know, defaults and
browser walls, right? So, um I suspect
that at some point agents will become
the new browser. I don't think I'm alone
in that speculation. Like I think a lot
of people will go to the internet
through their agent at which point the
idea of having sort of a tight
vertically integrated ecosystem that
precludes some agents from accessing
some content is probably going to be
challenged in court frankly uh because
it's going to have you know monopolistic
tendencies. So
you know I think we've seen this with
browsers we've seen this with search
engines. You know, I suspect we'll see
this with agents eventually, but right
now agents are still in that sort of
middle ground where they're just coming
out of the, oh, this is awesome
technology and we're just figuring out
what to do with it phase. And they're
just entering the, hey, we can make a
ton of money with these things phase, so
we better protect our moat. And I
suspect we'll come to the like, you
know, commoditized and or, you know,
like legally regulated phase in a little
while.
>> Yeah. Abraham, I think I'll give you the
last word on today's episode. I mean,
what do we what do we do about this
dynamic that Gabe is talking about? It
sort of feels like um you know, the
whole point of the agent is that there's
one agent that you can use to do
everything that you want to do say
across the internet. Where this is going
seems to be like, okay, well, you got
the chat GPT agent, but if you want to
buy something on Amazon, you you got to
use the the Amazon agent basically,
which like kind of destroys the whole
original uh value prop, I think, for
agents in some a certain sense. It
certainly makes it like a lot more
annoying to manage. Um, is there a way
to get back to the world of sort of free
flowing agents that kind of like operate
on your behalf generally? Are we kind of
by dent of the business incentives here
kind of being pulled towards a world of
like it'll just be app world again?
Right. Every what used to have an app
now you just have an agent that replaces
that but we'll be effectively the same
the same world.
>> Yeah, that's a I mean that's an
interesting question. I mean off the top
>> I'm just giving you a small one to wrap
up the episode.
>> Yeah. No, no. You [laughter] know what?
Thinking out loud, maybe it's less of a,
you know, a third party agent that um
controls the entire like you is your
control flow into the internet and it's
more of a personal agent that calls
particular third party agents as they
are approved or required. So from your
experience from an end user, I have an
agent and it calls, you know, there's a
multi- aent system that's for Abraham
Daniels and it can only call or it calls
the required agents or the agents that
are tailored for the specific uses that
I want to be able to carry out. Um, but
other than that, I I know that's a
that's a very nebulous very kind of like
big hairy problem question that I I
generally don't have the the full answer
to. But I I do think that, you know,
this Amazon article really showcases,
you know, platform competition versus AI
competition. And what I thought was kind
of neat about it was more so the um the
opportunity to transition SEO to AI
assistance um to just to to Gab's point
in terms of um you know trying to find
dollars and cents after all you know the
billions of dollars have been invested
into AI. Um, I think, and I said this on
a previous mixture of experts, I think
Open AI is now very very um focused on
how to eventually turn all these agents
and capabilities that they've built into
some to some dollars and cents. So, um I
I I I truly I kind of when I saw this, I
I really saw this as a a way to kind of
take some SEO capabilities away from uh
the the you know, the Googles of the
world and see if you can start to
utilize them as part of like your your
uh your shopping um experience with with
agents.
>> Yeah, it's a good note to end on. I
mean, so obviously this will be not the
last time we talk about this issue. Uh
but we are out of time for today. So,
uh, Aaron, Gabe, Abe, awesome to have
you on the show and hope to have you
back soon. And, uh, thanks to all you
listeners. If you enjoyed what you
heard, you can get us on Apple Podcast,
Spotify, [music] and podcast platforms
everywhere. And we'll see you next week
on Mixture of Experts. [snorts]
[music]
>> [music]