OpenAI's Open-Source Shift Debate
Key Points
- The Mixture of Experts podcast introduced its latest episode, featuring experts Chris Hay, Kaoutar El Maghraoui, and newcomer Bruno Aziza to discuss rapid AI developments.
- The panel highlighted several breaking stories, including Genie 3, Claude Code rate limiting, Mark Zuckerberg’s “superintelligence train,” and the headline news of OpenAI’s release of two open‑source models (120 B and 20 B parameters).
- Kaoutar noted that OpenAI is balancing competitive pressure to open up with ethical responsibilities to contain powerful capabilities, suggesting a cautious but possible shift toward openness.
- Bruno emphasized that the open‑source move reflects a broader industry trend aimed at expanding enterprise engagement, though he stopped short of predicting full openness by 2030.
- Chris expressed a dissenting view, indicating that not all experts agree on whether OpenAI will transition to an open‑source model in the near future.
Sections
- AI News Roundup: GPT‑OSS & More - Tim Hwang’s Mixture of Experts podcast previews the week’s biggest AI stories—GPT‑OSS, Genie 3, Claude code rate limits, and Mark Zuckerberg’s superintelligence push—joined by a panel of leading tech experts.
- OpenAI's Open‑Source Dilemma - The speakers debate whether OpenAI will fully open‑source its models by 2030, weighing profitability and competitive advantages against the risk of losing market share to emerging open‑source AI alternatives.
- Defensive Hybrid Strategy & Branding - The speaker urges firms to adopt hybrid, open‑source AI models and diversify beyond consumer‑only offerings as a defensive move against competition, while highlighting the importance of branding in a market trending toward vertically integrated, Apple‑like AI ecosystems.
- Edge AI vs. Backend Giants - The speaker contrasts lightweight, consumer‑grade models that run quickly on edge devices with massive multimodal back‑end systems, noting the former’s speed and platform flexibility but limited competitiveness, while highlighting brand benefits and the continued need for large backend models.
- DeepMind Unveils Genie 3 - The speakers discuss DeepMind’s new Genie 3 model, which generates immersive 3‑D worlds from textual descriptions, and debate whether it represents a groundbreaking research advance or merely an impressive demo.
- Consumer vs Enterprise Video Generation - The speaker debates whether AI‑driven video and 3D world generation will stay a professional, enterprise‑only capability or evolve into a everyday consumer tool, referencing recent product launches and seeking expert perspective.
- Immersive AI Worlds for Enterprise - The speaker envisions AI‑driven, infinite 3‑D environments as the next evolution for corporate training, onboarding, and sales communication, while cautioning that the required compute could make the solution very expensive.
- AI, Quantum Hype and Claude Code Limits - The speaker compares AI to a “machine God,” speculates about quantum breakthroughs for gaming, then critiques Anthropic’s new rate‑limit policy on Claude Code for $200‑per‑month pro users.
- Optimizing AI Model Costs - The speakers discuss leveraging hardware improvements, adaptive token caching, and continuous software optimization to lower the high subscription fees of AI models, framing it as a race to keep pricing sustainably low.
- Costly AI Coding Agent Overload - The speakers discuss how running many AI coding agents across Claude and ChatGPT leads to soaring expenses, constant cloud rate‑limit hits, and strategic decisions about model usage and subscription tiers.
- Balancing AI Performance and Cost - The speaker outlines the technical hurdles of delivering fast, affordable generative AI at scale—such as batching, compiled execution, and tiered routing—while emphasizing the steep expense of AI‑driven search versus traditional methods and urging careful assessment of use‑case value.
- Comparing Corporate Superintelligence Visions - The conversation highlights differing approaches to superintelligence among OpenAI, Meta, and Anthropic, with Bruno noting varied interfaces like Meta’s glasses, privacy concerns, and the expansive data collection underlying each strategy.
- Multi‑Device Future and Subscription Robots - The speaker argues that brain implants, glasses, neck wearables and phones will coexist, with smart glasses taking a leading role, while subscription‑based robots costing around $2,000 a month will handle errands in a multi‑device economy.
- Skepticism Over Meta's AGI Hype - The speaker voices cautious doubt that current AGI announcements are more marketing than breakthrough, while acknowledging Meta's hardware push—especially AR glasses—as a potential platform for a personal AI operating system.
Full Transcript
# OpenAI's Open-Source Shift Debate **Source:** [https://www.youtube.com/watch?v=Dtr0scHQVXc](https://www.youtube.com/watch?v=Dtr0scHQVXc) **Duration:** 00:43:31 ## Summary - The Mixture of Experts podcast introduced its latest episode, featuring experts Chris Hay, Kaoutar El Maghraoui, and newcomer Bruno Aziza to discuss rapid AI developments. - The panel highlighted several breaking stories, including Genie 3, Claude Code rate limiting, Mark Zuckerberg’s “superintelligence train,” and the headline news of OpenAI’s release of two open‑source models (120 B and 20 B parameters). - Kaoutar noted that OpenAI is balancing competitive pressure to open up with ethical responsibilities to contain powerful capabilities, suggesting a cautious but possible shift toward openness. - Bruno emphasized that the open‑source move reflects a broader industry trend aimed at expanding enterprise engagement, though he stopped short of predicting full openness by 2030. - Chris expressed a dissenting view, indicating that not all experts agree on whether OpenAI will transition to an open‑source model in the near future. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=0s) **AI News Roundup: GPT‑OSS & More** - Tim Hwang’s Mixture of Experts podcast previews the week’s biggest AI stories—GPT‑OSS, Genie 3, Claude code rate limits, and Mark Zuckerberg’s superintelligence push—joined by a panel of leading tech experts. - [00:03:10](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=190s) **OpenAI's Open‑Source Dilemma** - The speakers debate whether OpenAI will fully open‑source its models by 2030, weighing profitability and competitive advantages against the risk of losing market share to emerging open‑source AI alternatives. - [00:06:14](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=374s) **Defensive Hybrid Strategy & Branding** - The speaker urges firms to adopt hybrid, open‑source AI models and diversify beyond consumer‑only offerings as a defensive move against competition, while highlighting the importance of branding in a market trending toward vertically integrated, Apple‑like AI ecosystems. - [00:09:19](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=559s) **Edge AI vs. Backend Giants** - The speaker contrasts lightweight, consumer‑grade models that run quickly on edge devices with massive multimodal back‑end systems, noting the former’s speed and platform flexibility but limited competitiveness, while highlighting brand benefits and the continued need for large backend models. - [00:12:23](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=743s) **DeepMind Unveils Genie 3** - The speakers discuss DeepMind’s new Genie 3 model, which generates immersive 3‑D worlds from textual descriptions, and debate whether it represents a groundbreaking research advance or merely an impressive demo. - [00:15:28](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=928s) **Consumer vs Enterprise Video Generation** - The speaker debates whether AI‑driven video and 3D world generation will stay a professional, enterprise‑only capability or evolve into a everyday consumer tool, referencing recent product launches and seeking expert perspective. - [00:18:39](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=1119s) **Immersive AI Worlds for Enterprise** - The speaker envisions AI‑driven, infinite 3‑D environments as the next evolution for corporate training, onboarding, and sales communication, while cautioning that the required compute could make the solution very expensive. - [00:21:47](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=1307s) **AI, Quantum Hype and Claude Code Limits** - The speaker compares AI to a “machine God,” speculates about quantum breakthroughs for gaming, then critiques Anthropic’s new rate‑limit policy on Claude Code for $200‑per‑month pro users. - [00:24:52](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=1492s) **Optimizing AI Model Costs** - The speakers discuss leveraging hardware improvements, adaptive token caching, and continuous software optimization to lower the high subscription fees of AI models, framing it as a race to keep pricing sustainably low. - [00:27:57](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=1677s) **Costly AI Coding Agent Overload** - The speakers discuss how running many AI coding agents across Claude and ChatGPT leads to soaring expenses, constant cloud rate‑limit hits, and strategic decisions about model usage and subscription tiers. - [00:31:11](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=1871s) **Balancing AI Performance and Cost** - The speaker outlines the technical hurdles of delivering fast, affordable generative AI at scale—such as batching, compiled execution, and tiered routing—while emphasizing the steep expense of AI‑driven search versus traditional methods and urging careful assessment of use‑case value. - [00:34:16](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=2056s) **Comparing Corporate Superintelligence Visions** - The conversation highlights differing approaches to superintelligence among OpenAI, Meta, and Anthropic, with Bruno noting varied interfaces like Meta’s glasses, privacy concerns, and the expansive data collection underlying each strategy. - [00:37:23](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=2243s) **Multi‑Device Future and Subscription Robots** - The speaker argues that brain implants, glasses, neck wearables and phones will coexist, with smart glasses taking a leading role, while subscription‑based robots costing around $2,000 a month will handle errands in a multi‑device economy. - [00:40:31](https://www.youtube.com/watch?v=Dtr0scHQVXc&t=2431s) **Skepticism Over Meta's AGI Hype** - The speaker voices cautious doubt that current AGI announcements are more marketing than breakthrough, while acknowledging Meta's hardware push—especially AR glasses—as a potential platform for a personal AI operating system. ## Full Transcript
It's going to be crazy time.
That's the world I live in already.
So I'm now just jealous that I couldn't,
I could have been running agents in the background 24/7.
The window of opportunity where people were literally
just taking money out of, you know, Dario's pocket.
Basically, that could have been me.
It could have been me.
You could have been a big star, Chris.
All that and more on today's Mixture of Experts.
I'm Tim Hwang and welcome to Mixture of Experts.
Each week, MoE brings together a panel of the smartest
and wittiest voices in technology to explain, debate, and analyze our way
through the truly overwhelming wave of news
each week in Artificial intelligence. Today,
I'm joined by a stellar crew. Chris Hay, Distinguished Engineer
and CTO of Customer Transformation
Kaoutar El Maghraoui, Principal Research Scientist and Manager
for Hybrid Cloud Platform.
And joining us for the very first time is Bruno Aziza,
Vice President, Data, AI and Analytics Strategy.
We have a packed episode today,
and in fact, we're going to be publishing
early, given all the news that's just come in.
I think in like literally the last 72 hours,
we're going to talk about Genie 3,
Claude Code rate limiting and Zuck getting on the superintelligence train.
But first, let's talk about the big news of the week, which is gpt-oss.
So let's get into gpt-oss.
I really want to bring up this topic
because I think this is like one of the things
a little bit like GPT-5, where I think it's been rumored
that OpenAI has been working on this for months and months and months,
and it is finally here.
The quick recap of the news is that they've released two open source models,
120B model and a 20B model.
Um, and I think just to do our usual quick round the horn question,
you know, I think the one that I want to get from all the panelists
is basically how big of a trend is this?
And the question is, in the next five years,
will OpenAI be transitioned fully
into being an open source play versus a proprietary model play?
Um, Kaoutar, what do you think? Maybe.
I think because OpenAI is walking this tightrope between, uh,
competitive pressure to open up
and also, ethically, responsibility to keep up with,
you know, these dangers, capabilities contained. It's a good thought. Uh,
Bruno, predictions for, uh, 2030.
Is OpenAI fully open source at that point?
Well, first, thanks for having me.
I'm really excited to be part of this crew.
Uh, second, it's really hard to predict the future.
Uh, what? So I'm not going to take a chance on that.
But what I will say, though, is I think it's indicative of a trend
here in this market where OpenAI,
I think, is seeing the opportunity
to do something a little different and probably get access
to enterprise issues that we deal with.
And so I think overall for the industry, it's a great move.
Great. And Chris, finally, what do you think.
No.
Okay.
So we've got a real difference of opinion here.
Uh, I mean, Chris, as usual being a little spoiler on it.
Uh, Chris, do you want to put forward the argument like this open source thing?
I don't know, some people have been saying like, this is just marketing for them.
I don't know if you agree with that take.
I don't think it's just marketing.
I think the open white models are really, really important
and I love GPT OS.
They have done a fantastic job, so I expect them to continue on that trend.
I hope they continue on that trend.
But will they go fully open source by 2030?
I doubt it because they're going to want to keep
the big models to themselves.
Um, and they want to keep their competitive advantage
and they want to be able to make money.
But I, I applaud the move counter.
This is a little bit of a dangerous move, right?
I think to Chris's point, right.
Like ultimately OpenAI needs to needs to make money. Um,
and it sure feels like releasing
these very, very performant oss models.
It does really kind of compete with their core products, right? Like,
aren't aren't some of their customers going to just adopt OpenAI, open
source rather than having to pay, pay them.
How sustainable do you think this sort of thing is?
I think maybe they were pressured to do these things, uh,
because, uh,
if OpenAI stopped short here with no access to training, data,
architecture, design or ecosystem level tooling, um,
so they may win, you know, the short term market share.
But I think long term influence in this global
AI, you know,
competing, uh, world where these competing
open models and governance backed, open
initiatives are already filling that void.
So it is, you know, I think an issue with the competition
because we already seen like Mistral
and, uh, you know, Meta and, you know,
their models are also doing very well.
So if they keep everything closed, uh,
they might lose that competitive edge.
Uh, but again, here, they don't give access to the training data, the ecosystem level.
And I think providing the open weights is, is very important,
you know, because they can also get,
you know, that researchers, you know, to play with their models and fine tune.
And so that is I think an important play for them uh, to have.
They will continue on that.
Well, whether they will become fully open source eventually.
I kind of agree with Chris.
Maybe not fully open source.
Uh, they might maybe have a adopt a hybrid strategy
because of this competitive pressure.
Uh, because if the US firms don't open up completely
deep seek and Chinese alternatives,
they can dominate the open AI ecosystem,
not technologically but also culturally.
So there is, you know, this legitimacy pressure,
uh, that is kind of pushing them to open source, some of their models.
And the open weight is a great initiative, I feel.
Yeah. Bruno, I see you nodding. Yeah, yeah.
I'm going to provide a little bit of uh, maybe a different perspective on this
because, you know, I spend a lot of time with customers
and I think the future is hybrid, right?
There's not going to be one model to rule at all.
There's not going to be one deployment model.
And often we have to make this choices one or the other.
I think the enterprise is going to get both.
And I think the reason for why it's in a way it could be interpreted
as a good move from OpenAI, at least for the customers.
One is they have to do this offensively, right, I think.
You've got a lot of consumers that are familiar
with the model, and they're using it all day long.
Kind of like what we started using the iPhone.
And now they have the opportunity of well, now you could be in the enterprise.
You can use that model for yourself.
And so I think it's a great way for
for them to lean on the familiarity of consumers
using their model and also now starting to.
So I think that's one second one I think is what you're saying is defensively,
I think if they don't do this,
there's a lot of competition in this space
and they might lose the
what could be, in fact a very profitable market
in the enterprise today.
I mean, doing this for the consumers,
I think, as everybody knows,
is fairly expensive game.
And so I think they ought to do it for their business
to diversify their approach and really focus on
what we know the future is, which is not just cloud,
it is not just closed.
It is hybrid, and it's hybrid forever. Yeah. For sure.
I mean, that defensive point I think is worth pushing on.
I mean, you know, Chris, I think it's a really interesting development
I did want to talk a little bit about kind of like the branding
in some sense of these models. Um,
like, I mean, what I expect, what I hear is that, you know,
like you're going to be able to get like gpt-oss
on watsonx pretty soon, right?
Which is like pretty different from like,
I think the way I was thinking about this, this market evolving,
which is we're going to see a little bit
like more of what happened in mobile, right,
where, you know, OpenAI was going to be the Apple.
They even hired Jony Ives, right.
Like the Apple of AI,
where everything is going to be vertically integrated,
you can only touch their models through their infrastructure.
It seems like, I mean, that that wall has fallen.
It seems like we're not moving towards Apple world in the market for AI.
Is that a good way of thinking about it? I don't know.
I mean, the way I like to think about this one is
these models are very specific to consumer grade hardware, right?
They have been specifically designed for that.
So if we take the 20 billion parameter model, it's
designed to run on a machine
with 16 GB of memory and a single GPU.
And then, you know, and they've even quantized it down to that level.
So and then for even the 120 billion parameter model
that's designed to run on a single A100 card,
so you can go and fine tune that and not hit across multiple cards.
And it's actually even designed to run on a high end MacBook Pro.
I'm running that on my machine at home,
not my IBM issued machine, but my personal one.
Um, you know, to be clear, take that how you want IBM.
Um, so, um, so it is designed specifically
and they've had to make trade offs as well.
So if we if we look at the model there,
you can see it's a text only model.
It's not a um it's not a multimodal model in that sense.
It is specific to the English language
and it is focused on code
and it's focused on agents, as you can imagine.
So it's hugely sort of, uh,
designed one to be fast,
but it's also designed for tool calling.
And in that sense it gives away a lot.
It's a reasoning model.
Um, which again, is a great move
because it gives away a lot of their architecture
that they're running in the back end there.
Um, so I think it is very, very
and, and actually, the fact is it's a mixture of experts model
where we get to see the amount of experts that they have.
And it's and the act of experts is, is actually tiny.
The number of parameters are active at any
point is really tiny, which gives you such fast speed.
So this is a model that's specifically designed
for consumer grade hardware and edge devices etc..
You know, so it's designed for US usage.
And now if we compare that to the back
end models that they're running, they're multimodal
They're going to be much larger, larger.
They're going to span across multiple H100s.
You know they're going to be multi-language
that they're going to have a lot more data put in there as well.
So I don't think these models that run on our machine
are going to be competitive with their back end systems. Now,
it's great that they have equivalence to the kind of the
the zero, three and zero for minis, which is great.
but again, I think they're not giving away a lot in that sense.
So we get to feel good.
We're going to be able to go and build authentic systems.
You're going to build up great brand affinity with OpenAI
and then, but ultimately you're probably going to
still be using some of the large models on the back
end system now.
So I think it's a great move,
and I absolutely applaud that you can run it on other people's platforms as well.
So, um, and I think that's going to it's going to come through.
So yeah, I'm excited.
And if you look at the enterprise usage patterns.
You're not going to get a lot of people that have been experimenting
with the models.
The OpenAI models that now can take this home or in their own environment
and actually optimize for, for costs in their own environment.
So I actually think for customers, there's a lot of upside to this.
Definitely I agree.
Yes. Basically I think it is a very strategic move.
I agree with what Bruno and uh, Chris said.
Uh, because they're facing, you know, this increasing competition
from the powerful open source models.
So when they release these things, they can first recapture
goodwill from the open source community.
Second, compete directly with these other open source players.
And third, drive adoption for their technology
on a wider scale, like Chris and also Bruno mentioned,
especially in enterprises that have these strict data privacy and security requirements
and need on premise solutions.
And it's still going to be on OpenAI rails.
So it's really important for them to,
you know, continue that adoption.
And and I think it's also around
also framing the narrative around democratic
AI, especially in force in US leadership in the field,
which is a political savvy move here, because we don't want
all the open source models to come from other countries.
So I think they're trying also to reinforce the US leadership here.
So this this open way distinction I think is crucial
because open AI is still trying to maintain
a level of control and competitive advantage
with their proprietary models like GPT four,
while still trying to reap the benefits of a more open approach.
So it's like, have your cake
and eat it too strategy.
Yeah, we're gonna have to see whether or not they can walk this tightrope.
It's going to be very, very interesting to see.
I'm going to move us on to our next topic. Um,
really wanted to cover this very,
very interesting thing that just popped up,
I think, just earlier this week, so DeepMind
launched a blog post describing its latest edition
of an open world generative model they call Genie 3.
And I super encourage you to go online and look it up.
I could describe it,
but words are not going to do a good job describing it.
I'm going to attempt to anyways to set up the discussion. Um,
the Genie kind of generation of models that DeepMind has been working on. Uh,
I think for me is like a truly magical demo
where the idea is you sort of describe what you want,
and then it basically creates like an immersive 3D world that you can sort of
walk around in and navigate in, uh,
on the on demand basically,
which is, I mean, as someone who played a lot of video
games growing up is a truly wild idea
that, like, you can basically just say, I would like this kind of virtual environment,
and that virtual environment just appears out of the other side.
Um, Bruno, maybe I'll toss it to you.
I mean, this is like a very impressive demo.
Why is this important from like a research standpoint?
Is this kind of just like a toy,
or should we actually be more focused on this for more reasons than.
It's a really cool demo. It's a big deal.
I'll admit I'm a little biased, of course,
because I just drawn from Google, so I,
I do experiment with a lot of the Google technology,
and I think this move to immersive
regenerative models
for video like this, I think is a big deal.
It's a big deal on a few dimensions.
I think one in the way that we experience information.
I don't know about you all, but I use NotebookLM
to prepare for some of my conversations.
And so and the book is going to get video mode, uh,
when I present information
and just did a presentation this weekend for one of my kids,
I used a video model from Google inside, uh, slides in order to communicate.
So the way we get influence
and how we consume and communicate, I think is is huge.
Now, this model I guess is not available
yet is a little bit different, right? It's, um.
When it's available, I guess we get it all.
Play with it, but it's not a typical of them. Right.
So and it's, it's going to have an impact
not just on how we communicate consume,
but how you think about the experience in movies and games.
And you can change on the spot what the experience is going to be.
And so I can't wait to see, uh,
what people are going to do with it,
because I think beyond just the consumer aspect of it,
I also see our ability to communicate
more effectively and experience different things,
and I think it's not going to take very long
before it changes the way we think about information.
Bruno, if if you are using PowerPoint to communicate with your kids,
then IBM is the right company for you.
Well welcome aboard.
Um, I try to influence them any way I can.
You know, so when words are limited
and language isn't efficient, I got to use images and videos.
That's right. I mean, just imagine
it's like puppy play. Say it after me.
Puppy. Next slide.
Dog.
As you see in the next slide.
Well, I think this is like a question that I had is
and this is, I think a discussion that's been playing out in the video
gen space as well, which is like
is there a consumer market for video gen. Right.
Like ultimately I could see. Right. Bruno.
You know, game designers using this
and, you know, VR designers using this.
But it's like on a day to day basis,
do we think consumers are going to want to be able
to just generate 3D worlds,
you know, on the fly in the same way that do
they really want to generate video kind of on the fly?
Seems to be a really big question to me.
And I know, you know, for example, Grok just announced their video generation feature
and they're selling it as, oh, this is the new Vine, right?
Like if you like video, short form video, social media, this is it.
But on demand I guess.
Chris question to you is just like, do you think ultimately
this kind of tech is like it's like an enterprise thing.
It's like for professionals that are designing movie and game experiences,
or do you really envision a world where this is like consumer
like you log on to your computer and like, I'm going to just type
in the game that I want to play and the computer generates it.
I think this is so transformational
that Google should change their company name immediately to, uh, to jump on this trend.
That's what I think should happen.
Maybe some some kind of metaverse meta.
I don't know. I don't know. I don't know.
I actually do think this is really important. Right.
Which is that I think 3D is the natural next space
for, uh, AI models,
because you're going to want to interact with things.
And as you're imagining new things, you can start to say,
okay, I'm going to want my code to run over here.
Or maybe you're an architect, you want to design your building, you want to see how it looks.
Maybe you're designing your kitchen.
You want to bring those models straight in and imagine how it's going to look.
And here's the placement there. So I think there is an enterprise case for this.
I think there's a consumer case actually just hanging out.
I want some chill vibes etc. I want to customize this space for me,
play some music are generated.
Of course.
Um, and then I think, just immerse yourself in these spaces
So I really do believe that reality is here.
And again, even simple things like this podcast, the, the
four of us or, you know,
2D spaces, we could be in the same space together
or interacting with each other and throwing things and all that sort of thing.
So I, I do think 3D is the space
that is going to become super important, I guess, uh,
with Veo 3, it the demos looked incredible.
Um, what wasn't clear to me was how real time that was.
I don't know if that video was sped up or not.
So that would be that would be one thing.
The probably the other thing is,
I imagine that the amount of compute used to generate
those scenes is probably,
you know, heating up small countries as we speak.
So I, I think that's probably going to be the kind of blocker there
But you know, it's very early in this technology and I imagine that it's, uh,
you know, it's going to improve in time and it's going to get faster
and it's going to be cheaper to run in the same way as
LLMS have done the same over time.
So I'm excited about this.
This is this is really where
I think the world is going to go.
And and I think it's going to lead to the personifications of our
AI helpers and, and all that sort of thing.
I this is where I want to be, you know, think about today
in the enterprise, the world of training,
you know, how many employees are happy
about the training programs they have to attend in videos
or the onboarding experience or even internal communication? Right.
If you're in sales and you want to communicate
and pump up your sales team,
this is going to open a whole type of new world,
I think, that we haven't seen just yet.
So it's really a big deal, I think, in how we consume,
but how we influence and get people an experience
that is very different from what they've gotten
in the 2D kind of model that we're in today.
I did want to kind of pick up on this question of cost, you know.
So I saw the demo and immediately was messaging my friend
being like, oh, imagine this future world where you like, subscribe.
And it's a massively multiplayer online world, but it's infinite.
You can go in any direction, right?
Because the computer just keeps generating more world for you to explore.
And we're like, oh, that'd be incredible.
And then my friend was like, well, you know,
the problem is it's going to cost you $1,000 a month.
It's going to cost you $2,000 a month, because like the amount of compute
you need to generate this at any level of,
you know, eye popping detail is like still very, very expensive.
And so I guess I have a question for you on just like
how quickly you think the costs will come down.
It's kind of relevant to whether or not this will become a consumer thing
or really even becomes a thing where, you know,
Chris, you're almost like casually like, oh,
I just want a virtual environment for a meeting that we're going to have.
Um, you know, that kind of implies
a sort of cost of producing these things,
which is way cheaper than where we are right now.
Do you think those costs will come down quickly, or is it is it,
you know, actually a pretty hard problem from here
to mass distribution at a pretty inexpensive cost.
Yeah. I think. I mean, just with the regular generative AI, we're already right now.
You know, kind of struggling with the cost of inference and the inference scaling.
So there's a lot of effort, you know,
to reduce that cost for generative AI
because, you know, even inferencing right now with the,
you know, the token generation and so on, the sequential nature of that.
So it is a tough problem to solve.
Um, and you know, how long is it going to take for that cost
to be, uh, to, to go down?
Um, I think it might take some time.
Uh, I don't know if quantum, you know,
technology can help, you know, with some of these things,
uh, to accelerate some of these simulations and, uh,
you know, quantum machine learning, that would be really cool.
You know, once we get to the quantum advantage and the useful quantum.
Um, so, um, yeah, I don't know exactly how long.
Uh, you know, I think there might be maybe some, uh,
breakthroughs in hardware and memory technologies and so on.
And the bandwidth that we're facing right now with generative AI.
But for sure, this is very exciting.
And I think it's the glimpse, a glimpse of the future of content generation
where anyone can become a game designer,
a word builder with a simple text prompt.
And of course, if done right, done cheaply,
I can see this becoming both huge in
both the consumer space and the enterprise space.
Yeah, I love this comment about like,
well, we may need quantum to get this to work.
You know, it's almost like with AI, it's like we've created this like machine God.
And we're like, well, we really need you to generate recipes
for, you know, cooking, you know, dinner.
And it's a little bit like, well, in order
to get these video games to work, we really need quantum,
like these massive, massive technological leaps to achieve,
I think, which is something very everyday, which I think is very important
in its own way.
Let's move to the next topic.
This is actually following in some ways on the theme
that we've been talking a little bit about.
I think the way to introduce this is to say I love Claude Code.
One of my favorite sort of technologies of this
era has been Claude Code.
And Claude
Code has- turns out to have these power users
who are running Claude Code agents,
24/7, 365, many instances all at the same time.
And this really kind of fascinating policy change took place
where Anthropic basically came out and said, look,
if you're on pro, you're on our max plan,
our $200 a month plan,
what we're going to do is we're going to implement some rate limits.
You can only actually get a certain amount of access to our models,
a certain amount of access to Claude Code, certain
amount of access to our base models.
And this also obviously created a little bit of a clamor.
But I think the main thing I really wanted to talk about,
which I think is like a really fun discussion, is,
you know, I know when $200 a month plans
hit the market, people are like, this is crazy.
Who's going to spend this kind of money on not one,
but multiple services at this rate.
You know, people then said,
okay, well, the reason you do that is because you have to pay
for the cost of all this infrastructure.
It's actually turns out to be really expensive if it's not VC subsidized.
How I kind of read this rate limiting is
even once you raise the price to $200,
it's still really hard to make this sustainable
because of how much people use AI.
Is this sustainable?
Like, what's the what's the real cost that we eventually will need to pay
for the dollars and cents to work out on,
on these proprietary models? I think there's a few things here.
I think, first of all, like we have all use code, uh, from cloud.
It's terrific.
Uh, I think, um, model.
And I think it's kind of like victim
of their own success to some extent. Right?
I mean, if you look at there's two stats here I think they shared
is that, you know, they've had seven outages.
So I think clearly this is kind of being victim of your own success.
And this new I think,
limiting model here is only going to affect 5% of people.
So in fact most people won't even see it.
Now, if I look at my own usage and the usage of my customers,
it is a challenge because, you know,
it's so good. You just use it all the time.
And I think the cost of per token is what, $9?
And so if you now do the math,
uh, it's really going to be difficult.
Um, for these models, the better they get into,
you know, giving you the, the, the, the answer
how they're going to be able to monetize this.
And so I think we're getting to that level.
It's like they're trying to figure out where the monetization is.
There's clearly pressure for getting there.
I mean I like what you said earlier is like,
you know, will you be able to figure out through optimization, through hardware?
I think, you know, we've seen this before, you know,
in the analytics world that I did a startup
that had something called adaptive caching.
I'm waiting to see when we're going to have, uh, token caching.
And we're going to start seeing ways that we can use software to optimize
the interface between the query, the request and the infrastructure cost of it.
And so I guess that's what's probably going to happen next for these models.
Yeah, totally. I mean, what you're describing is I think a really super interesting catch.
Curious about any thoughts on this is like
it's almost like a race against time, right.
What I mean by that is, um, either
you all, all of us on a call
are paying $4,000 a month for a subscription,
or they figure out a way to optimize it
to keep the cost sustainably lower. Right.
And by sustainably lower, we might mean $200 plus still. Um,
Kaoutar, is that kind of the world?
Like, who's are we going to win that race?
Are we going to end up in a world where it's like, yeah,
you just got to pay $4,000 a month for this? No,
I think it's going to be an ongoing optimization thing
that they have to, you know, that, of course, you know,
these companies that have all these massive infrastructure.
So they're continuously trying also to reduce their costs.
And of course they will reduce that.
That's going to reflect also on their prices.
So what Anthropic did I think was unnecessary and inevitable
because, you know, it just shows the sign of the mature-
maturation of the AI markets because the drama,
you know, around this rate limiting, I think it's a bit overblown
because it's a simple matter of economics.
You know, these are models, they're incredibly expensive to run,
and the small number of super users
can make these fixed price,
you know, subscriptions and profitable.
So the you know, it's like the end of this free lunch,
you know, the early days of generative AI,
you know, they were characterized by let's let's have a land grab,
you know, for the users where companies are offering,
you know, generous free tiers, etc.,
you know, to really grab the user base.
But now that the market is more established,
companies are really focusing on profitability here.
And so it means more restrictions, more tiers
and clear connections between the price and the usage.
So so I think this market for pro users,
where the $200 price, you know, is creating
basically a clear distinction between, you know, this
casual and professional users
and it's not it's it's just it's not just more about usage
but also access to these more powerful models,
new features, better support, etc. and that comes with the price.
So but of course, you know, it's going to be an ongoing race.
How do we optimize these models like Bruno mentioned.
You know, the caching of the tokens
there is, you know, all of these techniques that we're also leveraging in research,
you know, to figure out how do you basically do, you know,
the kV cache optimizations and the token optimizations and all of these things together,
you know, in vLLM and other infrastructures?
that's going to be really important to continue to advance,
to be able to lower the cost that, you know, whatever it takes.
I want to talk a little bit about sort of what
this implies, not on the kind of provider side.
We've been talking about a lot of infrastructure, cost benefit, profits.
Um, Chris, do you want to talk a little bit
about what this implies about usage?
Like it's it's a little bit crazy to me because what Claude or Claude,
what Anthropic is suggesting,
is that there are people who are running
just a truly mind boggling number of coding agents,
to the point where it's kind of breaking their bank.
Is that the right way of reading it?
Like, I don't know if your experience with Claude Code is similar
where you're spinning up, you know, 100 instances
to run 24/7 for you?
Um, no, I'm I'm on the 80 bucks plan for Claude Code,
and I was thinking about the 200 bucks plan. And,
you know,
if I thought I could have done that, I would have done that.
I would have run my agents in the background all the time.
So, um, so now I'm not going to upgrade to the 200 plan
because I already I already have stress with cloud.
I've got the 200 bucks on ChatGPT, but I. I
refuse to bring that down
just in case GPT-5 comes anytime soon.
So I'm just stressed all of the time.
So I hit I hit the cloud rate limits all of the time today.
In fact, I know what's going to happen on the 200
buck plan. It's going to go like this.
It's going to be like, oh, I was using opus to answer this question,
but I really feel that Sonnet could answer this question.
I'm going to give it to Sonnet, and then you'll be like,
oh, I'm just creating unit tests now.
I'll give that across to, to
to ChatGPT it can, it can do the unit test.
I'm not going to burn my tokens on cloud with that one.
And you're going to start to really think about
which model you're using for which use case, etc., etc.
and then you're going to automate that within cloud code and say,
okay, now I want this routed over here, I want this over.
You know, it's going to be crazy time.
That's the world I live in already.
So I'm not just jealous that I couldn't.
I could have been running agents in the background
24/7, the window of opportunity
where people were literally just taking money out of, you know.
Yeah. Dario's pocket, basically.
That could have been me.
It could have been me. You could have been a big star.
Chris isn't that good, though.
I mean, I just want to double click on what Chris just said.
Shouldn't we be thinking about optimization and not wasting resources?
Like it's almost kind of. kind of it feels to me like.
Exchange in value. Right.
So if I'm going to pay $80, I don't want to waste that Right.
And I know somebody else is paying for it.
But at some point, I think
I think about it a lot for enterprise customers
is that they're trying to hit that line of like,
am I getting more value than what I'm paying for?
And I think Claude is Anthropic is trying to figure out where is that?
Where is that been? Right. Where's the ceiling? Where's the floor?
But I would say is like- I'm paying for artificial intelligence,
I don't want to be thinking, that's the AI's job.
I mean, actually, this is like I mean, it's worth mentioning because. Oh,
sorry um, Kaoutar, I'll let you in.
Is like the original vision.
Was intelligence "too cheap to meter", right. Like that was the.
That was the Sam Altman, you know, slogan.
We're like in a world right now, which is like, looks very different from that.
But sorry, Kaoutar. Yeah,
I think this the optimizations and of course,
you know, the end users need don't need to think about these things.
But the companies, you know, delivering these platforms, that's really crucial.
So you know, these custom optimizations, it's I think a multi-front battle ground battleground
things like cave cache management through reuse, compression, quantization, pruning,
speculative decoding, desegregating, storage.
of techniques are being you know, these are like
it's like a top tier weapon here.
And you know, how do you combine that with batching and compiled execution
and, you know, tiered routing and all of that.
There's a lot of techniques here.
And it's not an easy, you know, thing
because with generative AI, you have,
you know, this prefilled and decoding which add complexity, additional complexity,
especially as the context lends, you know, keep increasing.
So so I think the winners here will deliver,
you know, scalable and fast and affordable
AI at real world volume.
So and I think it's still, you know, a competition ongoing,
um, you know, that these companies
delivering these services need to figure out.
And you don't even have to think about a difficult use case
like code, like you just think about the use case of searching.
I think I read in The Economist the cost of searching
using an OpenAI or Claude is seven times more than searching on Google.
So there's a real impact,
like in using the tool poorly.
And so I think it's a good thing that we're all thinking about, you know,
what's the value versus the cost here
and what's the best use case for the tool?
don't know, because I get my information back from OpenAI
and and clawed with the answer to my question.
Whereas I maybe in order to find out a thing,
I want to find out in Google,
I may be clicking on 50 different links and seeing adverts for,
you know, let me convince you, I.
Yeah.
Have you used AI mode now?
Now I'm telling you Google on Google's AI mode,
but use the AI mode more.
There is a clue in the title AI mode,
which is guess what's running behind their GPUs is very good.
So the same thing is happening.
It's very good.
All right. Um, I'm going to move us on
before we get into a protracted battle about AI mode,
which we should talk about at some point, but it's not on the agenda
for today's Mo.
All right.
I'm going to move us on to our final, uh, topic.
There's this, uh, a joke that I feel like
is is circulating around the AI space where,
you know, every time someone wants to make a big proclaiming,
uh, proclamation around AI,
they publish a single serving website
that has their essay on it.
And, uh, finally, I think
after every other major figure in AI has done
this, Zuck is finally out with his.
Um, so he released an essay called Personal Superintelligence Short Essay.
And in many ways, it is sort of like Meta's vision
of where all this technology is going.
And I do want to kind of just like read a quick paragraph
to kind of set the scene a little bit, though
I think the whole thing is worth reading.
So Zuck writes, quote, "as profound
as the abundance produced by AI May 1st day
be an even more meaningful impact on our lives will likely come from everyone
having a personal superintelligence that helps you achieve your goals.
Create what you want to see in the world.
Experience any adventure.
Be a better friend to those you care about and grow
to become the person you aspire to be."
And I kind of want to bring this one up
just because obviously Zuck is a major voice.
They've been on a tear recently building their superintelligence lab.
We talked about Scale just a few weeks ago.
And so, Bruno, maybe I'll kick it to you
is, I think the question I want us to start with is
everybody wants to do superintelligence, but it kind of feels like everybody. Well,
and you're gonna have to explain what you're doing in just a second,
but, uh, that, like, it seems like there's many different visions
of superintelligence emerging,
and I do want to spend some time to just talk about, like,
what are the differences that were emerging between
what OpenAI is trying to achieve,
what Meta is trying to achieve, what Anthropic is trying to achieve.
They really seem like different visions of superintelligence. But
Bruno, I'll let you explain your very quick
and shades and respond to the question.
I think. I think you're right.
I think there's multiple ways to think about it. I'm
wearing the, uh, the meta glasses,
um, which, uh, which I love because they are giving me a different,
uh, interface than than my phone.
And I think there's no doubt that in the future,
this will become obsolete in the way we interface with data
and then the rest of the world.
There are some issues around privacy, I think, to solve for this.
You know, I mean, you can real time post online
and then people might not like that.
Um, I think this I was reading just, uh, it's,
um, it takes it collects about 32 out of 35
possible data types and data points
And so this is this is super powerful in comparison to what you had.
And so I definitely kind of
like where he's going in terms of,
you know, if you think about OpenAI, OpenAI
is I think approach is telling us
is about productivity and power in a way. Uh,
Anthropic, I think, is on to this, the safety first,
you know, topic, which we need to talk about
governance, safety, attribution of content.
And I think, uh, you know, what uh, meta is after is, is, you know,
how do you give people super power?
And I definitely think that, You know,
sometimes we think about this
trend as the wrong way to think about it, right?
We think about simple use cases like searching or maybe doing something better than Google
But there's so much more right that I look in the way.
For instance, I consume and understand content much, much better.
I might actually be slower, but I'm actually be better.
Like I talked about notebook earlier.
NotebookLM gives me a mind map.
It gives me a podcast,
give me a synthesis of a paper that I might not read,
but I might end up understanding better because I have a different interface.
So definitely this idea of using gen AI as the argument
for us as humans to make us more useful
is that I like the way to think about it
versus the rest of the narrative that you see in the industry, which is about,
you know, a conflict between the human and the machine,
which I, I don't believe in. Yeah.
For sure. And I think actually, I mean,
you bringing up the sunglasses, I think is pretty interesting
because I think, Chris, another way of looking at this
is there's these essays,
but we can also think a little bit about like,
what are the next gen technologies these companies are demoing.
And it is kind of interesting to me that like Meta's
like, we've got the sunglasses.
That'll be the form factor for maybe AI in the future.
You know, there's the Jony Ives thing.
We've already mentioned him, so maybe worth returning to.
That is like they were talking about like, oh, it's going to be like this
like device with no screen, right?
going to be like the sort of future,
um, I guess among all these visions,
is there one that kind of, like, really sticks out to you
as someone who's, like, very deep in the space?
I think they all have their place.
Um, I really do think they all have their place, right?
Whether you're, um,
sticking something in your brain, whether you're sticking something in your glasses,
whether you're sticking something around your neck
or, you know, or whether you're,
um, you know, just interacting with your phone or
I think all of these things have a place I,
I think we'll know when we know.
And I and I hate I hate that answer in that sense,
but I think we know when we know.
And I,
I do think the glasses probably has a big thing.
We're all used to wearing glasses anyway, or some of us are.
I think this enhancement, as you go around in,
you know, have a look at something, etc.,
I think that's going to make a lot of sense.
I don't think the phone's going away, though,
because I think you're still going to pick up and read things.
I just think we're going to be in a multi-device world
and we're all going to be spending a lot of money.
I, I do like the robotic subscription cost of $2,000 a month.
So yeah, everything's $200 a month, you know,
and if I get a $200 a month robot, I'm
going to be sending that thing running around, right? All at a time.
It's like you run, robot, I want my money.
You go to the shop and get me this.
You get me that. My glasses I'm going to wear all the time I have.
By the way, I have a pair of those glasses.
Bruno and I put it on, and I do.
I do conference calls with them and it's really nice in the summer.
You wander around pretty good.
But But people think you're nuts.
They think you're speaking to yourself.
People give me money in the street
because they think I'm a nutter.
So I you've monetized
the use of your of your Meta glasses is basically. Yeah.
I'm a shift, Chris.
People maybe want more people use it.
Yeah, but I think about somebody
saw you like using your phone?
I don't know, hundreds of years ago.
They would. You would think you're nuts. Yeah,
but, you know, you know,
even more nuts because now we're not even going to be speaking to people.
You're going to be like, I'm speaking to my AI.
And they're like, "ah, yeah, your AI? Okay, that's nice.".
But don't you run into this already. I mean, I don't have hair.
So when, when I wear those people, see, I'm on the phone.
But if I had a long hair, you could imagine that I'm talking to myself.
So I think we've crossed that already.
I'll give you one use case that we, I think will be very useful.
My mother is French, right.
So I'm French and my mother in law is American.
Neither of them speak the same language.
I would love to have subtitles here
so that they can talk to each other and understand.
That's where I think a use case like the classic is super helpful, right?
And I agree with you, Chris.
Like I do phone calls.
I wouldn't listen to music with it because it's
just not the quality that I want.
But it's great every once in a while to be on the phone call
and having to put something in your ear.
So I think really good for this. Yeah,
am podcast. If you want to listen to Mixture of Experts put on your
Meta glasses and listen through your temples. Kaoutar,
maybe I'll turn to you to give you sort of the last word here. Um,
I think in the in the past, when we've talked about the superintelligence topic,
you've tended to be a little bit more skeptical on all things superintelligence.
I feel like, uh, it's a good time to be like,
especially with all the announcements this week.
Are you are you feeling the AGI like, do you feel like
we're on a superintelligence path, or do you still think that essays like this,
or maybe you're a little skeptical? It's mostly marketing.
Of course. I still have some skeptic skepticism here.
Uh, some of it, you know.
Seems to me it's like marketing,
uh, because, you know,
you know, this is a rebranding of the existing ideas
and the way to distract from Meta's other challenges.
So there is some truth to this.
You know, superintelligence is a buzzy term.
And the essay, you know, is light on concrete details.
However, it does provide a clear and compelling vision.
Like, you know, I think everybody hears that.
It's a nice vision.
Uh, and, uh, you know,
and, you know, it's basically it's distinct
from what we're hearing from other major players.
You know, so it's really fascinating
to see how this vision evolves as the technology mature.
But one thing here, the hardware here is key.
So this vision of a personal superintelligence
is basically linked to Meta's hardware ambitions.
So if AR glasses become the next major computing platform.
Meta of course, will be in a prime position
to own what we call the operating system of personal AI.
So who's going to win that battle What devices.
Of course we're going to be surrounded by all kinds of devices.
Like Chris said, maybe something in your brain, something around your neck,
something in your hand, something or a robot walking with you.
So who's going to own that?
Or are we just going to be like a hybrid world
where you have multiple OS' or personal,
you know, I computers or devices that are running around
and everybody prefers certain devices versus others.
So it's going to be an interesting, uh, play to see
and how these things all evolve.
Are we going to converge into one single platform like the phone?
Sometimes I feel it's one unifying platform.
We're all using it, but are we getting into a world
where it's a multitude of devices,
pretty diverse, fit per person,
or we're still going to see some convergence?
So it's going to be an interesting thing to see. Totally.
Yeah. The idea of like multi-layered personal superintelligence is,
is both interesting and also very funny
is kind of like the idea, like in the future,
I have a hyper intelligent sunglasses.
At the same time, I have a hyper intelligent watch
and a hyper intelligent phone, and they all come from different companies, right?
And like, they will actually be this very funny period where, I don't know,
maybe my Apple Watch is like those glasses always get it wrong.
Yeah. Trying to like, undermine one another.
Um, but even something you could design yourself
using AI, like a device that you design,
that you can wear, that you can, you know, who knows?
It could be like, super personalized.
That's all, you know, generated by AI in a virtual 3D world.
So it's going to be interesting. Well, this is great. Uh,
it's all the time that we have for today. Uh,
Kaoutar, Chris, good as always to have you on the show.
And, Bruno, hopefully we'll have you back some time here on MoE.
Thanks to all you listeners for joining us.
If you enjoyed what you heard, you can get us on Apple Podcasts,
Spotify and podcast platforms everywhere,
and we'll see you next week on Mixture of Experts.