Deep Learning Hitting a Wall?
Key Points
- The panel opened with a heated debate on whether deep learning is “hitting a wall,” with Chris Hay claiming models are getting worse, Kush Varshney acknowledging challenges but seeing them as surmountable, and Kate Soule asserting that new applications keep the field advancing.
- Host Tim Hwang introduced the episode’s theme “Mixture of Experts,” framing the discussion around the release of DeepSeek‑V3 as a public showdown between AI optimists and skeptics.
- OpenAI’s latest model, o3, was highlighted for dramatically out‑performing traditional benchmarks such as frontier math, reigniting confidence that deep learning progress has not stalled.
- The strong o3 results are presented as a narrative reset after recent speculation that pre‑training breakthroughs were fading and that deep learning was entering a slowdown cycle.
- Chris Hay, despite his earlier pessimism, expressed enthusiasm for o3’s inference‑time efficiency and praised the model’s practical performance, suggesting that substantial gains are still possible.
Sections
- Panel Debate: Deep Learning Hitting a Wall - In a lively “Mixture of Experts” session, a panel of AI leaders shares divergent views on whether deep learning has stalled—ranging from outright pessimism to cautious optimism about overcoming challenges and unlocking new applications.
- Benchmarks, Speed, and Coding Models - The speaker criticizes benchmarks, praises the reasoning strength of o1/o1 Pro while noting its slow response time, and explains switching between models to balance speed and depth, especially for coding tasks.
- Flexible Compute Trade‑offs with O3 - The speaker explains how the O3 model enables dynamic balancing of inference time, latency, and cost by allowing users to choose between low‑resource quick responses and high‑resource, higher‑quality outputs.
- Aligning Models with Safety Policies - Discussion of using regulatory text and synthetic data to train models, emphasizing inference‑time safety checks, governance, and recent alignment research.
- User-Defined Safety Trade‑offs - The speakers explore letting users allocate AI effort between safety and problem‑solving, advocate for deliberative, democratic input into safety policies, and wrestle with the tension between rigorous safety and delivering faster, more entertaining models.
- Cost Curves, Fine‑Tuning, and Agents - The speakers argue that recent training tricks dramatically lower pre‑training expenses, but the future focus should shift to inference efficiency, fine‑tuning, and deploying models as agents.
- The AI Training Pendulum Through 2025 - The participants discuss how the back‑and‑forth cycle between pre‑training massive models and deploying smarter inference‑time models will dominate AI development up to 2025, while also questioning the hidden data costs behind open‑source projects.
- Global AI Governance and Standards - The speakers discuss the need for worldwide, technically‑driven AI governance—likening it to ICANN’s voluntary standards and noting an upcoming Paris meeting of safety institutes to create codes of practice—while also touching on switching between open‑source and closed‑source model modes.
- Governance Challenges of Tiny Autonomous Agents - The speakers contend that by 2025, the hardest AI governance problems will stem from the rapid, unregulated misuse and trust deficits of autonomous agents built on very small models, rather than from controlling the largest, most prominent models.
- Public Bet on AI's Future - The speaker outlines a public wager between an AI skeptic and Miles Brundage that lists ten possible AI milestones—like producing world‑class creative works—to force concrete definitions of “truly powerful” models, and asks whether this approach meaningfully gauges AI progress or merely adds to Twitter noise.
- LLMs Aren't Authors, Tradition Over Authorship - The speaker argues that large language models function as collaborative tools within a long‑standing literary tradition, lacking genuine authorship, and that framing AI ethics around authorial credit is a misguided question.
- Bridging AI Misconceptions for the Public - The speaker argues that AI hype outpaces everyday understanding, urging relatable explanations of generative AI for non‑technical audiences.
- AI Models Mirror Organizational Structure - The speakers discuss how rapid innovation, Conway's Law, and corporate team dynamics influence the behavior and characteristics of AI models, using examples from pre‑training teams and a humorous reference to Anthropic's Claude.
Full Transcript
# Deep Learning Hitting a Wall? **Source:** [https://www.youtube.com/watch?v=QzERUfTbKQw](https://www.youtube.com/watch?v=QzERUfTbKQw) **Duration:** 00:39:16 ## Summary - The panel opened with a heated debate on whether deep learning is “hitting a wall,” with Chris Hay claiming models are getting worse, Kush Varshney acknowledging challenges but seeing them as surmountable, and Kate Soule asserting that new applications keep the field advancing. - Host Tim Hwang introduced the episode’s theme “Mixture of Experts,” framing the discussion around the release of DeepSeek‑V3 as a public showdown between AI optimists and skeptics. - OpenAI’s latest model, o3, was highlighted for dramatically out‑performing traditional benchmarks such as frontier math, reigniting confidence that deep learning progress has not stalled. - The strong o3 results are presented as a narrative reset after recent speculation that pre‑training breakthroughs were fading and that deep learning was entering a slowdown cycle. - Chris Hay, despite his earlier pessimism, expressed enthusiasm for o3’s inference‑time efficiency and praised the model’s practical performance, suggesting that substantial gains are still possible. ## Sections - [00:00:00](https://www.youtube.com/watch?v=QzERUfTbKQw&t=0s) **Panel Debate: Deep Learning Hitting a Wall** - In a lively “Mixture of Experts” session, a panel of AI leaders shares divergent views on whether deep learning has stalled—ranging from outright pessimism to cautious optimism about overcoming challenges and unlocking new applications. - [00:03:09](https://www.youtube.com/watch?v=QzERUfTbKQw&t=189s) **Benchmarks, Speed, and Coding Models** - The speaker criticizes benchmarks, praises the reasoning strength of o1/o1 Pro while noting its slow response time, and explains switching between models to balance speed and depth, especially for coding tasks. - [00:06:16](https://www.youtube.com/watch?v=QzERUfTbKQw&t=376s) **Flexible Compute Trade‑offs with O3** - The speaker explains how the O3 model enables dynamic balancing of inference time, latency, and cost by allowing users to choose between low‑resource quick responses and high‑resource, higher‑quality outputs. - [00:09:18](https://www.youtube.com/watch?v=QzERUfTbKQw&t=558s) **Aligning Models with Safety Policies** - Discussion of using regulatory text and synthetic data to train models, emphasizing inference‑time safety checks, governance, and recent alignment research. - [00:12:22](https://www.youtube.com/watch?v=QzERUfTbKQw&t=742s) **User-Defined Safety Trade‑offs** - The speakers explore letting users allocate AI effort between safety and problem‑solving, advocate for deliberative, democratic input into safety policies, and wrestle with the tension between rigorous safety and delivering faster, more entertaining models. - [00:15:47](https://www.youtube.com/watch?v=QzERUfTbKQw&t=947s) **Cost Curves, Fine‑Tuning, and Agents** - The speakers argue that recent training tricks dramatically lower pre‑training expenses, but the future focus should shift to inference efficiency, fine‑tuning, and deploying models as agents. - [00:18:50](https://www.youtube.com/watch?v=QzERUfTbKQw&t=1130s) **The AI Training Pendulum Through 2025** - The participants discuss how the back‑and‑forth cycle between pre‑training massive models and deploying smarter inference‑time models will dominate AI development up to 2025, while also questioning the hidden data costs behind open‑source projects. - [00:22:10](https://www.youtube.com/watch?v=QzERUfTbKQw&t=1330s) **Global AI Governance and Standards** - The speakers discuss the need for worldwide, technically‑driven AI governance—likening it to ICANN’s voluntary standards and noting an upcoming Paris meeting of safety institutes to create codes of practice—while also touching on switching between open‑source and closed‑source model modes. - [00:25:21](https://www.youtube.com/watch?v=QzERUfTbKQw&t=1521s) **Governance Challenges of Tiny Autonomous Agents** - The speakers contend that by 2025, the hardest AI governance problems will stem from the rapid, unregulated misuse and trust deficits of autonomous agents built on very small models, rather than from controlling the largest, most prominent models. - [00:28:26](https://www.youtube.com/watch?v=QzERUfTbKQw&t=1706s) **Public Bet on AI's Future** - The speaker outlines a public wager between an AI skeptic and Miles Brundage that lists ten possible AI milestones—like producing world‑class creative works—to force concrete definitions of “truly powerful” models, and asks whether this approach meaningfully gauges AI progress or merely adds to Twitter noise. - [00:31:46](https://www.youtube.com/watch?v=QzERUfTbKQw&t=1906s) **LLMs Aren't Authors, Tradition Over Authorship** - The speaker argues that large language models function as collaborative tools within a long‑standing literary tradition, lacking genuine authorship, and that framing AI ethics around authorial credit is a misguided question. - [00:34:52](https://www.youtube.com/watch?v=QzERUfTbKQw&t=2092s) **Bridging AI Misconceptions for the Public** - The speaker argues that AI hype outpaces everyday understanding, urging relatable explanations of generative AI for non‑technical audiences. - [00:37:54](https://www.youtube.com/watch?v=QzERUfTbKQw&t=2274s) **AI Models Mirror Organizational Structure** - The speakers discuss how rapid innovation, Conway's Law, and corporate team dynamics influence the behavior and characteristics of AI models, using examples from pre‑training teams and a humorous reference to Anthropic's Claude. ## Full Transcript
Frequently asked question, is
deep learning hitting a wall?
Chris Hay is a distinguished engineer
and the CTO of Customer Transformation.
Chris, what do you think?
Oh yeah, totally, Tim.
In fact, I think it's getting backwards.
I think the models are getting
worse and worse and worse.
This is the worst it's ever been.
It's totally hit a wall, Tim.
Happy 2025, Chris.
Uh, Kush Varshney, an IBM fellow
working on issues of AI governance.
Kush, welcome back.
Uh, what do you think?
I think there is a wall, but
it's not an insurmountable one.
I think we're making progress.
We're, uh, changing it up instead
of just taking some steps.
We're, uh, doing some rock climbing.
A little bit more of a serious answer.
And Kate Soule is Director of Technical
Product Management for Granite.
Kate, happy 2025.
Uh, what's your take?
No, I don't think deep
learning is hitting a wall.
I think we're finding new ways to
apply it in 2025 that's going to
have some interesting benefits.
All right.
All that and more on today's mixture of experts.
I'm Tim Hwang happy 2025, and
welcome to Mixture of Experts.
Each week, MoE offers a world class
panel of product leaders, researchers,
and engineers to analyze the biggest
breaking news in artificial intelligence.
Today we're going to be talking about the
release of DeepSeek-V3, a very public wager
between an AI booster and an AI skeptic.
Thanks But first, let's talk about OpenAI's o3.
Um, this was, uh, the last announcement of
OpenAI's 12 Days of OpenAI, uh, marketing
event that they did at the end of last year.
Uh, and it was arguably
the biggest announcement.
Uh, they basically have touted a new
model, which is now getting sort of limited
trial access for safety purposes, um,
that blows out of the water a lot of the
benchmarks that people have traditionally
used to measure or argue for measuring
Whether or not we're getting close to AGI.
So, uh, on a benchmark that we've talked about
on the show in the past, uh, frontier math, um,
open, openAI's, uh, o3 is doing incredibly well.
Um, and I think one of the reasons I wanted to
kind of bring this up is that it really does
seem like, you know, after, I think what was a
new cycle late last year of people saying deep
learning slowing down, the old methods don't
work anymore, pre training is over and a lot of
general hand wringing, um, this really kind of.
Reset the narrative at least in the
circles that I run in to say actually
that you know There's there's maybe
a lot more room to run on all this.
Um, Chris, maybe I'll turn to you first You
sort of outright made fun of me on the opening
question Um, what's your take on the o3 model?
Like how important is it?
Does it really kind of indicate that
there's still a lot more progress to run?
How do you read it, basically?
I think it's a great thing, actually.
So I've been playing a lot with the old
one and the old one pro models, and I've
been having the best time with them.
So inference time compute is really working.
So I'm excited about o3.
I'm just kind of annoyed
that we don't have it though.
That's the real thing.
And so it's yet another.
You know, this is coming soon and, and that's
sort of annoying me, especially being in
Europe because in Europe we don't get anything
these days, we didn't get Sora, we didn't
get half the, any models that are coming
through on, uh, the 12 days of Christmas.
So, um, I'm excited about o3, as for
the benchmark thing, two things in
my mind about that, one, you know, my
opinion, benchmarks are stupid, so I'm
not really going to read into that.
And then probably the second thing is even
if we take the opinion that benchmarks aren't
stupid, then it took an awful lot of time to
come back with that answers and it was a little
bit kind of monkeys and typewriters, right?
Which is if you type long enough, then
you're eventually going to get the answer.
But, but with that, Aside, actually, I'm
so impressed by o1 and o1 Pro that actually
I'm super excited about o3 and I think it's
going to be a great model and it's really
proving sort of inference time compute.
Yeah. One follow up there is, uh, I know you're
saying you think all benchmarks are
stupid, but you think this model is better.
So what use case do you have in mind
where you're like, Oh, actually,
it seems to be a one's noticeably
better than what we've had before.
Yeah, there's probably a few ones.
The main one for me is coding, right?
I mean, it is completely in a different level.
Even Claude 3.5 Sonnet, um, GPT 4.
0, the early versions of o1.
Honestly, o1 Pro is, is on a different level.
Now, probably the big thing that I've found
myself working with the models is Pro just takes
quite a long time to come back with an answer.
So, I end up switching
between models all the time.
It's like, okay, I want a fast answer on this.
I think it can handle this.
Oh, no, it can't handle it.
I'm going to switch from o1 to o1 Pro.
So.
Um, so that sort of changing models just to
get fast answers back and how much reasoning
I want from the models is a sort of technique.
But for me, coding is
definitely the biggest thing.
I don't really care about the
math stuff because, like, I'll
just use a calculator, right?
But definitely for coding, I see a mark.
Got it.
Okay, maybe I'll turn to you.
So I think.
You know, if you're not watching this
space super closely, it's easy I think to
just get like, bewildered by the number
of models and find variations between
all these models kind of coming out.
Um, you know, I think famously, or like it was
kind of talked about that the reason they jumped
from oh one to oh three was that oh two was I
think already used by the UK telecom company.
So it was like a trademark thing
that got the o3.
Um, but I guess Kate, question for you
is if you can help our listeners kind of
understand a little bit of like what's.
What's new with what they're trying
with o3, like kind of looking under the
hood, you know, these models seem to be
a lot more performant, but there also
seem to be like a lot of new things that
they're trying underneath the surface.
And I think it's worth kind of for our
listeners to know a little bit of the flavor
of that if you want to speak to that at all.
Absolutely.
So I think the most important thing
for our listeners to understand when
looking at the new o3 model and the o-
model series in general from OpenAI.
Is that we're transitioning from spending
and innovating at the training time of the
model and instead saying, okay, let's take
a model that's been trained and let's run it
multiple times and spend more compute at the
actual inference time when it's being output.
deployed out in the world, and it seems like
with the o3 models, they're continuing to
innovate and what can be done at inference time,
having the models essentially think longer to
risk anthropomorphizing these models, think
longer through different tasks, search through
many different potential options and solutions
before picking the best one, which then leads to
improved performance, but also it takes longer.
Uh, to Chris's point, you have
to wait longer for a response.
One of the things that I
think is really important.
Really exciting about the oh three model and
this broader investment and pivot to more
inference time compute is that it actually
can give you some really nice trade offs.
And I think this is where we're heading.
And o3 is, uh, you know, foreshadowed
a little bit that you can run these models
in a more efficient mode, or if you need
the maximum performance, you can run
them in kind of a compute intensive mode.
And I think that's going to be really cool
because it gives people the ability to
set their compute budget, set their time
constraints, you know, for latency, if
they need an answer, a response quickly.
And I think we're going to see a lot more of
that in 2025 of people playing along that kind
of cost performance trade off, even within a
single model saying, okay, I want my model only
to think about this for, you know, a minute
versus I want my model to give a response
immediately versus My model can think about
this for five minutes and then give me a
response back, depending all on how much I'm
willing to pay and how important it is that
the model gives a really strong response back.
Yeah, definitely.
Yeah. Some people were joking online.
I saw that this is kind of, it's the return of
the old turbo mode on computers where you're
like, we want the computer to work harder.
Um, but it actually, uh, it's a really
interesting question about like,
almost like, I don't What do users want
the computer to think hardest about?
Which I think is kind of a counterintuitive
question about like what types of queries and
what types of tasks you know demand that It'll
be really interesting to to see Chris it's ideal
to have you on the line as well because I think
one of the most interesting parts of the launch,
you know, I think Chris was frustrated by it.
He was like Come on, just
give me access to the model.
But in traditional open AI style, they've
said, well, no, we're being careful with the
launch and you can get access to the model
if you're a safety or security researcher.
Um, and they're allowing people to
have kind of like requested access
to go and red team the model.
I'm curious about how you read that as someone
who thinks about AI governance, you know,
is that kind of going to be the paradigm for
how companies release models going forwards?
Or, you know, is open AI kind of almost like,
There do you see this as marketing, right?
They're using the safety thing to
be like, give us just a few more
months to iron out the loose ends.
I think it's a combination of both
actually, because, um, there's this
concept of the gradient of release.
Um, and, uh, Irene Solomon from Hugging
Face came up with this and it's kind
of like, uh, maybe take your time.
Um, The more powerful the model
is, maybe the more, um, kind of
the slower you need to roll it out.
But, um, I think it's a combination.
Um, so OpenAI, uh, gave their models
to, um, the UKI Safety Institute
for testing, uh, in advance as well.
And, um, uh, some of this I think is, uh,
just to be able to say that they did it.
Some of it is to actually have some, uh, some
better, uh, safety alignment and so forth.
So, yeah, I think it's, uh, it's here and there.
And, um, uh, one other thing
that's in this o3 release.
Um, that they talk about is a new
way of doing safety alignment.
They call it a deliberative alignment.
And, um, uh, I think it's,
uh, it's kind of interesting.
Um, they're, uh, saying that they are
very much looking at an actual safety
regulation, um, taking the text from that
and training the model with respect to it.
Um, doing some synthetic data generation
that, uh, follows along with, uh,
with, with what the policy says.
And, um, so something we've been doing for a
while as well, um, last year, um, we published.
a couple of papers, um, we call
alignment studio and we call alignment
from unstructured text and so forth.
And I think those sort of ideas, um, they're
kind of carrying through the, the new part is,
um, again, the, uh, uh, the fact that this is
spending a lot of time on the inference side,
um, then thinking again and again about, uh, uh,
am I meeting those safety requirements or not?
And, uh, uh, As both Chris and Kate
said, right, I mean, um, the more time
you're spending over on the inference
side, what should you be thinking about?
What should the model be thinking about?
And I said this in the, uh, the
last episode of the new year.
I think that extra thinking is going
to be for governance quite a bit.
So, um, so I think, uh, this
is where it's going to play.
And, uh, yeah, I'm excited to, to
see, uh, maybe I'll sign up to, so
yeah, I do some of the safety testing.
Yeah. I think it's kind of two
interesting things here.
And what you said, I mean, I think one
of them is, you know, the model to date
feels like has been, you release the model,
but then you're also like, we guarantee
safety by releasing safety models, right?
Granite has done this.
Um, and.
You know, maybe Kate is a question
back to you is sort of how much do you
think that's kind of just like almost
just like historically provisional?
This is just like what we kind of have to do
right now because we're still working out the
kinks on making the models themselves safe.
I guess in the future, one argument is that
the models are just kind of safe out of
the box in a way that doesn't separately
require another model that kind of
monitors outputs and does the safety work.
Um, do you think that's the case or do you
think this kind of bifurcated architecture
is going to be what we'll see going forwards.
Well, first I'd be careful.
I don't think anyone can guarantee
safety no matter what we release, right?
But I do think we're going to continue to
see more and more of these kind of safety
guardrails being brought intrinsically
into the model through these new types
of alignment that Kush mentioned.
That does not mean though that we
shouldn't also have additional layers.
Of security and safety that have, uh, you know,
an independent check right on model outputs.
So I don't see that going away.
I think it's always going to be a yes.
And right, let's continue to add
more and more layers, not we're going
to scribe away, you know, some of
these layers, put it into the model.
And now you've got one model, you're all set.
Very interesting.
Kush, maybe the other thing that I think I'll
pick up on what you said before we move to the
next topic is, um, you know, you're basically
talking about inference as being almost like
this kind of fixed, fixed budget of time.
And you're basically like, what do you want
to spend, have the model spend their time
on thinking about the problem or thinking
about whether or not the responses are
safe or consistent with a safety policy.
And I'm modeling my internal Chris here.
Who probably would be like, you're spending
some of that time on trying to make it safe.
Like, could it, could it just solve the problem?
Um, and I guess I'm kind of curious is like,
maybe that will become, do you think that
will become a lever over time where you can
almost like, the user will specify, I need
10 percent of your time spent on safety, 90
percent of the time on solving the problem or
otherwise, or, you know, That actually kind
of opens up a whole nother world in some ways.
Yeah, it does open up a whole new world.
I mean, um, I wouldn't say that, uh, I would
want to spend a lot of time on the, this sort
of safety deliberation either, but, um, I
think, uh, the, the fact that they're calling it
deliberative, um, it kind of speaks to something
that, uh, I mean, deliberation is meant to
be like a discussion among lots of different
viewpoints and, and this sort of thing.
I don't know if that's actually what.
It'll happen, but that's something I would want
to happen so that, uh, different viewpoints,
different sort of perspectives, um, can be
brought into, to these different policies as
well because, um, uh, in, I mean, democratic
sort of, uh, settings, you do want deliberation.
You do want, uh, kind of minority voices to be
heard as well, but, um, uh, not sure exactly
that's what, what they mean by deliberative.
Absolutely.
Um, Chris, I appreciate you're shaking
your head, so I want to make sure
I'm not putting words in your mouth.
I, I honestly think safety is super
important, but I want the models quicker.
So, you know, so do what do what you need to
do and and I want the models to be fun So don't
don't lobotomize them, you know what I mean?
Um, but you know, you know, we don't
want to do harmful stuff, but at the
same time, come on, you know, it's
like, I want to play with the models.
Chris basically wants everything.
Tim, we are also kind of assuming
OpenAI is going to give us the choice,
right, of how we want the model to
spend that inference time compute.
And I don't think that's the
direction that they're headed.
I think they've got some
clear regulatory guidelines.
They're trying to, to meet performance issues,
uh, that they want to make sure are addressed.
I don't see them handing over kind of the
keys to the kingdom, so to speak, to let
us take these models for our own joy rides.
Yeah, no, I think that's for, for sure.
Right.
Um, and yeah, I think there's a bunch
of interesting questions that are
sort of empirical questions, right.
It's just like, how much can, you know, how
much do safety, Like, how much does safety
inference lead to better outcomes, right?
Like, how much of this is like a
mutually exclusive pie versus ones
where you can get a little bit of both?
How much is going to be
defined by the regulator?
How much is going to be defined by the user?
Um, a lot of things to pay attention
to, I think, going into 2020.
Five.
So I'm going to move us on to our next topic,
which is the release of DeepSeek-V3.
Um, this is sort of an interesting announcement
because I think we were, uh, me and the
production team were kind of tying up at the
end of the year and we're like, nothing's going
to happen in the last few weeks of the year.
And of course there was the o3
announcement, which was huge.
And then also similarly big was
the announcement of DeepSeek-V3.
Um, and so this is an open
source model coming out of China.
That shows incredibly good performance
on a lot of the benchmarks, um, uh,
that most models are evaluated against.
And I think there's a lot of interesting
things to talk about here, but I think maybe
the first one, which I'll throw to Chris, is
this kind of, uh, Claim that the DeepSeek team
is making that they were able to basically
build this incredibly performant model for
way lower cost than you would expect And I
think a lot of the commentary online and I
think one of the things that made me think
about is that there's so much That's built
on the economy of AI That is sort of based on
the idea that it's just really expensive to
get You know, really high performance models.
Um, but this almost seems like the cost curve
might be collapsing faster than we think.
I don't know, Chris, maybe that's
a little bit too optimistic, but
yeah, I'll maybe throw it to you.
I think it's kind of
interesting what they've done.
So they have put a lot of cool techniques
within the pre training side of things.
And, um, I mean, even things like multi token
prediction, and then they were better at kind of
loss of tokens, et cetera, and how they route.
So there's a lot of things they did in
training that they brought the cost down
in, and I think they were doing kind of.
Mixed precision, uh, as well, so there was
a lot of good things that they did there.
I think what I would say though is that back to
the earlier point about inference time compute
and kind of pre train, I, I wonder at what point
we maybe stop obsessing with the pre trained
side of things for models and actually, you
know, be able to kind of fine tune those models
and have that, uh, community of fine tuning.
existing.
And I think that's going to be more
interesting, especially in the world of agents.
Happy New Year.
I'm the first person to
say agents on the podcast.
So thanks, Chris.
So I think that's more interesting.
And, and I, and as we move more towards
inference time compute, I think that that
will become important there, but it is.
Really impressive for what they did
actually for the cost of the model and
how long it took them to kind of to train
that I honestly, they did a great job.
So, yeah, there's going to be
more innovation in that space.
I still think pre training though is hugely
inefficient because you're really just saying.
Here's the entire text of the internet.
Go, go learn from it.
And I, I honestly think that's
probably an innovation that I would
hope that would change in 2025.
And the way I think about it is if I
think about the kind of internet, it
almost has a knowledge graph anyway.
And I wonder if actually during that
training process, if we brought a little
bit of structure in the knowledge graphs
into the pre training process, Then a lot
of those, uh, training elements may come
out, uh, a little bit quicker and better.
I don't know.
I mean, I'm just sort of,
uh, sort of guessing here.
But I think, I think there's a lot
more innovation to do in pre train.
Um, so hopefully with inference time compute,
we're all going to be running around doing that.
But I'm hoping that that
focus on pre train doesn't go.
So really good job to the DeepSeek
team to continue to innovate.
Yeah, I remember, um, you know, when I worked
a lot more closely with pre training teams,
I thought it was very interesting is at
least among, you know, at least among the
nerds, at least among the engineers, right?
Like it was very interesting was that,
you know, pre training was like the
high prestige part of the organization.
Right, like you're running the rocket
launch of AI and then fine tuning
something that we do afterwards.
But like, I think all the inference stuff and
all the stuff that we're seeing kind of point
to this like shift in the kind of cultural
capital within these companies where it's
like, Oh, we're all the action right now is
really happening after the pre training step.
And I guess, Chris, almost what
you're proposing is maybe like at some
point, like the pendulum swings back.
Because it's like, okay, there's all of
this kind of innovation still to be done
on the pre training side, but we're just
not there because of the hype cycle.
It's going to swing back and forward,
back and forward, back and forward.
And, and, and you're going to see that, right?
Because you're going to get to the point
where you go, um, you know, the, the train
isn't good enough to do what we need to do.
So therefore we're going to use the, the
smarter inference time models to get better
data, to train the pre trained models.
That's going to become more efficient.
And then we're going to do the same on fine
tune and that pendulum is going to swing and
swing because you're going to keep hitting
kind of limits in one area and you're going
to go back to the earlier, like the pre train
to try and fix that and you're just going
to go back and forward, back and forward.
So that pendulum is going to swing
all the way through 2025, buddy.
Yeah, definitely.
Uh, Kate, any thoughts on this?
I mean, as someone who, you know, works with
a team on open source AI, I assume something
like Deke Seek is a, is a big, big deal.
a big marker in some ways,
a big way to start 2025.
Yeah, and I agree with Chris.
The team did an incredible job, but in terms
of the cost, I don't know the full details of
what data was or was not used in the model.
My hypothesis is they are using data
that was available online that cost
a lot more than $5,000 to generate.
Right. So that I don't know that that total
cost estimate actually reflects the
fully burdened cost of the model.
I suspect that they like many model
providers are leveraging all of the data
that's now been posted and shared online.
That actually is only possible because
others have invested so much money
in creating larger models that can
be used to then generate that data.
That kind of to what Chris was saying
can be taken back into training.
So I think what I'm really interested in
with the DeepSeek model, aside from that
is, you know, it's a mixture of experts
architecture, which is really interesting.
So when it runs that inference, it's, you
know, a 600 plus billion parameters, but at
inference time, you know, it's only about
40 billion parameters, meaning it can run
much more efficiently than even like a Llama
400-, you know, plus billion parameter model.
So.
I think that's where we're going to see a lot
more innovation happening in 2025 is really
digging into how we make these architectures
more efficient, how we activate the right
parameters at inference time, fewer parameters
at inference time to still drive performance
without having to pay for the entire cost of
running, you know, 600- plus billion parameters.
Yeah, that's really interesting.
Um, Kush, from a governance standpoint,
This is an interesting story as well.
Um, right.
Uh, you know, I think there's, there's
certainly a vision among some folks, which is
like, well, we just passed the laws in the U.S.
and all the big AI companies are in the U.S. of course.
And so that's why, like, that's
how we, that's how we govern AI.
Um, but this is really a different world, right?
Like, you know, a law passed in the U.
S. is not going to change, you know,
what the DeepSeek team is doing.
Um, You know, is, is governance
possible in this world?
Right? Because it sure seems like, you know, you, you,
you are seeing so much AI progress everywhere.
Mm-hmm . Um, that governance
becomes a real question.
Yeah.
I mean, uh, we talked about this,
uh, before the show started that, uh.
There's, uh, these core socialist values that,
uh, are required of any generative AI in China.
Um, it's a law that's been around for
more than a couple of years now and,
um, uh, DeepSeek has to satisfy those.
So, um, I mean, those are things that are gonna
be around, uh, and I think, uh, the fact that
all of these different AI safety institutes
from different countries are forming a network,
um, they're convening, uh, they're figuring
things out, um, together is a great sign.
I think, uh, Uh, AI governance
needs to be a worldwide activity.
There's no special thing because
of one country or another country.
And, uh, uh, the more we can,
uh, kind of bring everything into
harmonization, the better it will be.
Yeah, I think that'll be one
really interesting bit is.
You know, I think there was a thinking sort of
maybe a few years back, which is we're going
to do sort of law and regulation to do this.
You know, Kush, I guess kind of where you're
suggesting is a world where it's a little
bit more sort of technical experts, like
it looks a little bit like ICANN, right?
Like in terms of how we govern the web, where,
you know, kind of technical experts meeting and
they establish these standards and it's kind
of voluntary protocol more than anything else.
Do you think that's how things are going to go?
Yeah, I think that's how it's gonna go.
Um, so in February, there's a meeting
in Paris where all of these safety and
safety institutes are coming together.
So I think they'll come up with a plan.
They'll, uh, they'll figure out some
codes of practice and all these things.
So that's where I think things are headed.
Chris, you started this episode by
talking a little bit about how you,
like, switch between different modes of
OpenAI, right, where you're like, okay,
well, we're going to use the 01 for this.
We're going to use the 01 Pro for this.
Um, do you do that kind of switching across
open source and closed source at all?
Yeah.
You do? Okay.
Yeah, no, I do that a lot with different models.
So like the Llama models, for
example, I've got such personality.
So if I'm doing any kind of.
writing new stuff, then I tend to run into
the kind of llama models, the granite, the
granite models, I use quite a lot as well, I
use them a lot for kind of RAG type scenarios
because they're really good at that, in that
case, and, and also if I'm pulling factual
information, then I really want to be sure
where the data's been coming from, so I
tend to lean on granite in those cases, for
coding, I tend to lean on o1, I, I have a
lot of fun, actually, we're talking the kind
of, uh, some of the Chinese models have a lot
of fun with the Quine models at the moment.
They're doing some great stuff in
the same way as kind of DeepSeekers.
So I think you're gonna just use different
models for different cases, right?
Because some models are good at
certain language translations.
Some models are good at kind of writing tasks.
Some are really good at code.
Um, And then the smaller models, for example,
you know, especially low latency, especially
for agents, I said agents again, um,
exactly.
If you've got different agents, you want
to run that on the smallest possible model
is going to perform the task that you need.
So, I, I think we're in this world where
we are just going to use a lot of models.
Um, I think we're going to, if I, again,
talk in 2025, I hate to say this, but I think
we're going to stop talking about models,
uh, so much, uh, towards the end of the
year, maybe more, because you're going to
be caring about the tasks that it's doing.
Here's a language translation agent.
Here is an agent that is
going to write me unit tests.
I don't really, I do care to model,
but I, I'm going to care more about
the tasks that it's performing.
And.
And then coming back to the kind of security
and the kind of governance things for a second.
I think that's where governance
starts to become really hard, right?
Because if you've got very small models,
like an 8 billion parameter model, and
it's, Got access to tools, and you've
got it being orchestrated over the top.
You know what, you can get into a lot of
trouble very, very quickly with a tiny model.
Um, and do some really interesting things.
And I'm just not sure governance wise that
you're going to be able to do a lot about that.
So I think, um, As much as we talk about the
large models and governance, um, in 2025,
actually I think we're going to start to hit
the challenges of people doing interesting
things with agents on the really tiny models.
Yeah, you're saying almost like we'll be
able to govern the biggest companies and the
biggest models, but that might not matter is
kind of what you're saying, is that right?
I think so, yeah.
I guess, Kush, do you want to respond
to that as someone who focuses his
time on thinking about AI governance?
I guess Chris is effectively saying maybe
it's just not sustainable over time.
Yeah.
Um, so I agree.
Uh, and I'll say agents, uh,
number three for the episode.
Um,
This is really bad.
This is becoming a meme because people are going
to just start throwing it out for no reason.
Um, but yeah,
I think, uh, when there is tool use,
when there's autonomous, That's where
governance really becomes interesting.
Um, so we've talked a lot over the years about
trustworthy AI, and it wasn't really like
trust was a part of it, but really trust is
needed when something is going to be acting
autonomously because you don't have the ability
to control it or monitor it and these sort of
things, and that's really where trust is needed.
So, um, So, and the more volatile, the
more uncertain, um, more complex, uh,
these things happen to be running and,
and so forth, and yeah, I mean, that's
exactly where governance is the hardest.
And I think where, uh, a lot of
the innovation is going to happen.
Uh, before we move on to the final topic,
uh, Kate, maybe I'll turn it to you, you
know, I thought it was very interesting.
I had never really thought
about like that switch from.
You know, I've heard about like, oh, I do
o1 Pro versus not, oh, you know, o1 Preview.
But the switch from open source to closed
source, I think is pretty interesting.
Maybe a final question before we move on to
the last topic is, um, do you think right
now open source has any specific kind of
capability advantages over closed source?
Or is that not even the right distinction here?
You know, I think it was very interesting
that Chris was like, oh, actually, like some
of these models just have like way, the open
source models have better personality, right?
Um, that's kind of an
interesting outcome in some ways.
Yeah, I don't see it so much as a
open versus closed source question.
I think different models are going to have
inherently different strengths and weaknesses.
And so if you only limit yourself to
closed source or closed source from one
provider, you're going to miss out on kind
of that suite and being able to pick and
choose the best model for the best task.
Ultimately, that'll be the dream in the
future is someone sends me like a, uh,
AI generated email and I'm like, yeah,
you're probably relying on granite.
I know what this sounds like.
So last segment we want to focus on today
is a sort of interesting smaller bit of
news that popped up at the end of last year.
But I think it's a fun one,
particularly as we get into 2025.
If you don't know him, Gary Marcus is
a longtime skeptic of all things AI.
I think for every successive wave, Gary Marcus
is there being like, it's never going to work.
And the current revolution
in AI is no exception.
I think he's been a very big skeptic about
the degree to which LLMs can get us to quote.
True intelligence.
I'm going to talk about what that means.
Um, but interestingly, he established,
uh, or set up a kind of official public
bet with a gentleman by the name of
Miles Brundage, who used to do policy at
OpenAI, formerly of, he's independent now.
Um, and basically what the bet says is where
is AI going to be a few years from now?
And sets up a set of, I believe,
10 different kind of tasks that
AI could or could not take on.
And there's a lot of variation here,
but a lot of them all kind of pertain
to, you know, Can the model produce
kind of world class versions of XYZ?
So, you know, I think there's one
criteria is, you know, will an AI
produce a world class, you know, movie
scripts or other kind of creative work?
Um, and I think these bets are useful because
I think they, you know, kind of force folks
to, you know, put their money where their
mouth is and also kind of specify What
it is that they mean when they say that
a model is going to be, you know, truly
powerful and capabilities going forwards.
Um, and I guess I wanted to
get the view of this group.
Uh, you've seen kind of the, you know,
the Twitter slash X posts announcing this.
Um, Kate, maybe I'll turn to you.
I mean, is this a useful way of thinking
about where AI is going, or do you think
it's just more, you know, uh, Twitter noise?
I thought it was interesting to think
through, like when I was looking
through the different questions.
And ultimately, if I look at the different
items in that bet, the ones that stood out
to me the most were, uh, assertions that
would hallucinations basically be solved
by, you know, this year, uh, and I think
that's one of the Biggest reasons why,
personally, I actually wouldn't take that bet.
I don't think hallucinations
are going to be solved.
I think if you look at the model architecture,
even with the o1 and reasoning, you
know, my hypothesis is it's still a
transformer model trained on vast amount
of internet data that's being called.
called many times in many different ways with
reasoning and search, but I think there's
still some fundamental problems around
hallucinations that unless we really change
the type of data that we train on, the volume
of data that we train on, how the architecture
of these models, it's not going to go away
overnight or something we can necessarily
just incrementally cure ourselves of.
So I personally wouldn't take the bet.
I thought it was a useful
framing to kind of think through.
Yeah, for sure.
Uh, Kush, how about you?
Would you, would you have taken
the bet on either side, I guess?
Yeah, I think the authorship
question is an interesting one.
So, um, I mean, that's what
they're kind of going for.
Like, uh, can this be an Oscar winning
screenwriter, a Pulitzer Prize winning,
uh, author and, and the sort of stuff.
And, um, I'm going to take us on a
little bit of a different direction.
So, um, uh, so, I mean, the, uh, Uh, the, the
fact of it is that, like, people have been
coming up with all these analogies for LLMs,
like a stochastic parrot or a DJ or a mirror
of our society or these sort of things, but
I think that's the wrong way to look at it.
Um, so, uh, about 65 years ago, there was
this, uh, this book, um, that came out
called The Singer of Tales, um, by Alfred
Lord and, um, it was all about, like,
oral narrative poetry, um, so these bards
who are kind of singing, um, about, uh,
heroes and, and this sort of stuff and they
compose the language as they're singing it.
It's not like they write it beforehand
and, um, They use formulas and all
sorts of tricks to be able to do this.
And I think that's exactly what these LLMs are.
And, um, uh, in, in that sort of construct,
there is no like sense of authorship.
Um, it's like just, they're part of a tradition.
And so like, you would never think that
a Homer deserves a Pulitzer prize for
the Odyssey or, uh, Ved Vyas deserves
a Pulitzer for the Mahabharata.
I mean, this is just kind of
a tradition that's going on.
And that's, I think, um, the
right way to think about LLMs.
So, uh, so, so it's like the
question is not the right question.
Um, and even if you think about, uh, again,
going like very historical, philosophical,
um, so you had, uh, the sky, uh, Michel
Foucault, who asked what is an author?
And, uh, the, the answer, the discussion that
he had is the only reason we, like even thought
of authors is because lawyers needed someone to
blame when there were some bad ideas out there.
So, um, uh, I think that's the same thing.
It's like an LLM is not an author
and we shouldn't really be asking
for that, uh, sort of thing.
And I think it's the wrong question.
And I think it actually touches on what Kate
said as well, uh, which is basically like do
these Kind of criteria for the bet assume a
certain direction for AI that like might not
actually be the most important thing around
AI or even like an important aspect of, you
know, quote, really powerful AI systems, right?
Like it may not turn out in the end that
we really need to solve hallucination.
Or like it may not really turn out in
the end that the big impact on AI is
that you have like, you know, the, you
know, the Pulitzer prize winning AI that
generates a novel completely by scratch.
Um, that's kind of interesting.
Yeah. I don't know, Chris, maybe you haven't
had a chance to jump in just yet.
Uh, curious about what you think about all this.
Oh, I think the test is
totally stupid in my opinion.
And, and, and the reason is I looked down
the list of 10 items, and I don't think
I'm capable of doing any of those 10 items.
So if I'm not capable of doing the 10 items,
I'm, you know, is it unfair to think AI is
going to be able to do that within a year?
I mean, 10.
How are you doing your
Pulitzer Prize winning novel?
Is it, is it going well?
Or your Oscar winning well, Chrissy.
Any here, any, any programmer on the planet,
you know, have you been able to hit 10,
000 lines of code bug free first pass?
Come on, it's like, it's, I
think you're asking a lot.
I, it's like, The only one I think I
could maybe do is the video game one.
And it's like, I don't know when to
laugh at the right moment in movies.
You know, you just need to ask my wife that.
It's just like, why are you laughing?
I was like, oh, that thing over there, right?
It's, it's like, and am I able to, to
say the characters without hallucinating?
No, we all hallucinate.
It's like, we make up little, little subplots
that are going on our head in these movies.
So I think, I, I don't think it's a bad
thing, but I think you're asking a lot of
LLMs to be able to do the, you know, and
even putting that as a test for 2025 and,
you know, yeah, maybe, maybe AI will be
able to achieve three, four of these things.
I just, I just don't think it's, The
right time to be asking those questions.
Well, I don't know.
We just came back from our, you know, everyone
was out on holiday breaks where at least I
got to take a step outside of the Cambridge
tech bubble where everyone, you know, is
really deep into this technology and hearing
folks talk about AI, uh, you know, I have a
family member who calls it the AI machines.
Uh, there's a lot, I think, of
misconceptions of what AI can do and
what it's going to be useful for.
And so I think, like, Putting it in
terms of that, you know, everyday folks
can understand who watch movies and read
books and aren't necessarily living and
breathing the technology and helping show
that, no, that's not going to be possible.
Like, you know, X, Y, and Z, you guys
are thinking about this the wrong way.
I think it is helpful to have that
type of discussion and discourse.
Um, I think we take for granted a lot
that not everyone is living and breathing
this the same way that, you know, this
excellent panel is on, on generative AI.
Yeah, I'll guarantee to you that
the average person is not waking up
being like, should I use o3 or o1?
Those distinctions are not anything that
any normal person is thinking about.
Um, but yeah, I, I think that's,
that's a good point, right?
I mean, I think part of it is just like, you
know, There's a dream that all this AI becomes
kind of superhuman, right, at some point.
And I think, Chris, like, maybe to respond
to your comment, there's kind of an effort to
sort of be like, what would that look like?
Um, and I guess, yeah, maybe that does
really miss the point in some ways.
Um, yeah, uh, I also think it's also
like a really good indication of how
quickly our, um, expectations have
adjusted around the technology, right?
We're, we're like, had you asked me four
years ago, like, could it do all these things?
Could it just write an email?
You know, I'd be like, that's ridiculous.
And then now you're like, basically,
like, the expectation is like world class
Pulitzer Prize winning, you know, it's
kind of just like, because the baseline
is just like very normal to us now.
So it's, I guess, an indication
of the rising expectations.
around all of this stuff.
Just coming back to DeepSeek for a second.
Um, I think one thing that, uh, we didn't talk
about is, uh, just the culture at DeepSeek.
So there was an interview of, uh,
their CEO, um, that was, uh, making the
rounds, um, after DeepSeek came out.
But the interview was from November and, um, I
think the, the cultural aspect of how they kind
of developed this thing is really interesting.
They really feel followed this,
uh, sort of geek, uh, geek way.
So Andrew McAfee had this book, uh,
The Geek Way, and it's been very
popular within IBM circles, actually.
Um, so our, our CEO has been reading
it, uh, telling everyone to read it.
And it's kind of like, um, really like doing
things fast, um, being open, letting everyone
contribute, um, being very scientific about
things, trying to prove them out, um, not
having hierarchies and, and all of that stuff.
And that's exactly, like,
how DeepSeek is doing it.
And I think.
Uh, we can learn a lot from it, uh,
just, uh, we're a little bit too
encumbered, um, even though we want
to be, uh, doing things the same way.
So like how do, uh, other companies kind of
innovate in a rapid fashion in the same way?
So I think that's maybe, uh, uh,
something to learn, uh, as well.
Yeah.
One of the debates I have with a friend
of mine is, uh, There's a, what is it?
I think it's called Conway's Law.
So the idea is that you ship your org chart, um,
and that has kind of interesting implications
in the world of AI, where it's just like, well,
are all of these AIs going to basically in some
ways reflect the companies that create them?
And, you know, the reason why, you
know, certain models are more chatty.
is that this is just like in part a reflection
of like all of the people in that organization.
Interesting connotations
if you think of Chris's point about
pre trading and the, you know, how pre
training has been the focus and kind of
the most prestigious team to join, right?
That's right.
Yeah, yeah.
There's a joke because we have
a mutual friend who works at
Anthropic and we're like, it's cool.
It's Claude.
He's Claude.
It's very funny to kind of
just see play out in practice.
Well, that's great.
So let's leave it there.
Chris, great thought to end the
episode on and for us to start 2025.
Kush, Kate, Chris, as always,
incredible to have you on the show.
And thanks to you all for joining us.
If you enjoyed what you heard, you can get
us on Apple Podcasts, platforms everywhere.
And we will be here next week
on another episode of Mixture of
Experts.