Granite 4, Sora 2, OpenAI E‑Commerce
Key Points
- The episode introduces the “Mixture of Experts” panel—featuring Kate Sol, Kush Varsni, and Kautar El Magraui—to discuss new AI developments like Granite 4, Sora 2, OpenAI’s e‑commerce ChatGPT features, and a security bonus segment.
- Granite 4, launched on Hugging Face, offers a suite of compact, hybrid‑architecture language models that run on a single low‑cost GPU, making them attractive for developers and enterprises seeking affordable LLM deployment.
- Recent AI news highlighted by Ayla McConnen includes Meta’s plan to serve ads based on AI‑assistant conversations, Microsoft’s “Vibe working” agent that automates Excel, PowerPoint, and document creation, DoorDash’s Dot delivery robot, and the emergence of Tilly, an AI‑generated actress being courted by Hollywood agencies.
- The show teases upcoming discussions on Sora 2’s video‑production capabilities, OpenAI’s new e‑commerce integrations, and a bonus interview with Matt from Security Intelligence, encouraging listeners to subscribe for deeper AI insights.
Sections
- AI News Roundup on Mixture of Experts - The podcast preview outlines a panel discussion on new AI models like Sora 2, Granite 4, ChatGPT e‑commerce tools, and a headline about Meta leveraging AI chat data for ads.
- Granite Sets New Open‑Source Safety Standard - The speakers highlight Granite's ISO 42001 certification as a pioneering step in open‑source AI governance, debating whether it’s an outlier or indicative of a broader move toward stronger safety, compliance, and transparency, citing the Stanford Transparency Index.
- Scaling Efficient Models & Sonnet 4.5 - The speakers discuss expanding a high‑efficiency architecture for larger deployments, then shift to highlighting Claude Sonnet 4.5’s coding‑focused capabilities and its recent release.
- Shift Toward Specialized Hybrid AI - The speaker discusses moving from broad foundation models to sustainable, task‑specific AI by combining pre‑training with efficient inference‑time adaptation to handle dynamic, low‑data environments.
- Balancing Fun Prototypes with Robust Production - The speakers contrast OpenAI’s consumer‑oriented “vibe” apps like Sora 2 with Anthropic’s coding focus, highlighting the challenge of turning playful prototypes into secure, production‑grade solutions.
- Cost Sustainability of AI Video Models - The speakers discuss the high compute expense of large video‑generation models, question whether such services can remain cheap in the coming years, and consider future hardware innovations and pricing models.
- Scaling AI Video: Cost & Storage Challenges - The speakers discuss the financial and technical hurdles of scaling AI video generation, highlighting inference costs, massive storage needs, and the broader context of OpenAI’s latest release and open‑source competition.
- OpenAI vs Google in Agentic Commerce - The speaker contrasts OpenAI's rapid, Stripe‑centric, user‑experience‑focused approach to agentic e‑commerce with Google's consortium‑driven, interoperable AP2 protocol strategy, while noting Anthropic's positioning as a developer‑friendly tool.
- AI Agents Pose Social Engineering Threat - Experts discuss how AI agents differ from traditional software vulnerabilities, being vulnerable to malicious prompts and social engineering rather than code exploits.
- Promoting the Security Intelligence Podcast - The hosts summarize upcoming cyber‑security content, announce upcoming in‑depth expert interviews, and tell listeners how to access the show on IBM’s YouTube channel and major podcast platforms.
Full Transcript
# Granite 4, Sora 2, OpenAI E‑Commerce **Source:** [https://www.youtube.com/watch?v=LAXAmXHNGeM](https://www.youtube.com/watch?v=LAXAmXHNGeM) **Duration:** 00:41:53 ## Summary - The episode introduces the “Mixture of Experts” panel—featuring Kate Sol, Kush Varsni, and Kautar El Magraui—to discuss new AI developments like Granite 4, Sora 2, OpenAI’s e‑commerce ChatGPT features, and a security bonus segment. - Granite 4, launched on Hugging Face, offers a suite of compact, hybrid‑architecture language models that run on a single low‑cost GPU, making them attractive for developers and enterprises seeking affordable LLM deployment. - Recent AI news highlighted by Ayla McConnen includes Meta’s plan to serve ads based on AI‑assistant conversations, Microsoft’s “Vibe working” agent that automates Excel, PowerPoint, and document creation, DoorDash’s Dot delivery robot, and the emergence of Tilly, an AI‑generated actress being courted by Hollywood agencies. - The show teases upcoming discussions on Sora 2’s video‑production capabilities, OpenAI’s new e‑commerce integrations, and a bonus interview with Matt from Security Intelligence, encouraging listeners to subscribe for deeper AI insights. ## Sections - [00:00:00](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=0s) **AI News Roundup on Mixture of Experts** - The podcast preview outlines a panel discussion on new AI models like Sora 2, Granite 4, ChatGPT e‑commerce tools, and a headline about Meta leveraging AI chat data for ads. - [00:03:59](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=239s) **Granite Sets New Open‑Source Safety Standard** - The speakers highlight Granite's ISO 42001 certification as a pioneering step in open‑source AI governance, debating whether it’s an outlier or indicative of a broader move toward stronger safety, compliance, and transparency, citing the Stanford Transparency Index. - [00:09:59](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=599s) **Scaling Efficient Models & Sonnet 4.5** - The speakers discuss expanding a high‑efficiency architecture for larger deployments, then shift to highlighting Claude Sonnet 4.5’s coding‑focused capabilities and its recent release. - [00:13:25](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=805s) **Shift Toward Specialized Hybrid AI** - The speaker discusses moving from broad foundation models to sustainable, task‑specific AI by combining pre‑training with efficient inference‑time adaptation to handle dynamic, low‑data environments. - [00:18:36](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=1116s) **Balancing Fun Prototypes with Robust Production** - The speakers contrast OpenAI’s consumer‑oriented “vibe” apps like Sora 2 with Anthropic’s coding focus, highlighting the challenge of turning playful prototypes into secure, production‑grade solutions. - [00:24:45](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=1485s) **Cost Sustainability of AI Video Models** - The speakers discuss the high compute expense of large video‑generation models, question whether such services can remain cheap in the coming years, and consider future hardware innovations and pricing models. - [00:28:14](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=1694s) **Scaling AI Video: Cost & Storage Challenges** - The speakers discuss the financial and technical hurdles of scaling AI video generation, highlighting inference costs, massive storage needs, and the broader context of OpenAI’s latest release and open‑source competition. - [00:34:31](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=2071s) **OpenAI vs Google in Agentic Commerce** - The speaker contrasts OpenAI's rapid, Stripe‑centric, user‑experience‑focused approach to agentic e‑commerce with Google's consortium‑driven, interoperable AP2 protocol strategy, while noting Anthropic's positioning as a developer‑friendly tool. - [00:37:36](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=2256s) **AI Agents Pose Social Engineering Threat** - Experts discuss how AI agents differ from traditional software vulnerabilities, being vulnerable to malicious prompts and social engineering rather than code exploits. - [00:41:03](https://www.youtube.com/watch?v=LAXAmXHNGeM&t=2463s) **Promoting the Security Intelligence Podcast** - The hosts summarize upcoming cyber‑security content, announce upcoming in‑depth expert interviews, and tell listeners how to access the show on IBM’s YouTube channel and major podcast platforms. ## Full Transcript
Yeah, Sora 2 is, I think the Vibe video producing
app. I mean the Claude is the Vibe coding we
have. I mean vibe thinking, Vibe everything going on. I'm
just Vibe living at this point. Exactly. Exactly right. All
that and more on today's Mixture of Experts. I'm Tim
Huang and welcome to Mixture of Experts. Each week MOE
brings together a panel of cutting edge minds to help
you digest the week's news in artificial intelligence. This week
we've got a classic MOE panel which I'm very excited
about. We've got Kate Sol, Director of Technical Product Management
for Granite, Kush Varsni, IBM Fellow AI Governance, and Kautar
El Magraui, Principal Research scientist and Manager for the hybrid
cloud platform. We have so, so many different kinds of
AI models to talk about. This week we're going to
be talking about Granite 4 Sonnet 4.5, Sora 2. We're
also going to talk about these new E commerce features
that OpenAI has announced with ChatGPT. And we're actually doing
a bonus segment with Matt from Security Intelligence, so stay
tuned for that. But first I really wanted to turn
to eili, who's going to give us the. Hey everyone,
I'm Ayla McConnen, a tech news writer for IBM Sync.
I'm here with a few AI headlines you might have
missed this week. Meta will soon show ads on your
Facebook and Instagram accounts drawing on conversations you had with
its AI assistant. The jury, however, is still out on
whether this move is clever or creepy. First we had
Vibe coding, and now Microsoft has introduced Vibe working. This
refers to an AI agent that will write docs, crunch
numbers in Excel. And design PowerPoint slides for you. Next
time you order in, you might want to take a
closer look at your delivery name. DoorDash has released Dot,
an AI robot that can cross bike lanes, parking lots,
even sidewalks to bring you those late night bites you're
craving. There's some exciting new talent in Hollywood. Several agencies
are attempting to sign Tilly, an entirely AI generated actress.
What do you think? Could Tilly be considered for an
Academy Award? Let us know your thoughts in the comments.
Subscribe to the Think newsletter for more AI insights. And
now back to the episode. So I want to jump
right into it and talk a little bit about Granite
4. Kate, you're on the panel. You've been obviously very
close to this. What's coming out with Granite 4? What's
exciting? What should be people be paying attention to? We're
really excited to announce Granite4 was launched on Hugging Face
this past Thursday. The models feature a range of very
efficient smaller language models. So they're really designed for developers
to pick them up, play with them, deploy them, as
well as for enterprise customers that are looking for models
and options for LLMs that don't require 8H1 hundreds to
host. So these models all can fit on a single
GPU, including like an L40s, a 100, like much cheaper
GPU options thanks to their new hybrid architecture, which helps
gain some memory efficiencies. Yeah, that's great. And is this
is the right way of thinking about it? Like, what
is the delta you think between say, Granite 3 and
Granite 4? Right. Like, has the team been kind of
like focusing on particular things? It sounds like a little
bit of what you're saying is the trend of small
being beautiful is kind of continuing with Granite 4. Curious
if there's other kind of folks things that the team
has been really focused on. Yeah. So with Granite 4,
we're definitely doubling down on small efficient models. But Even
the smallest Granite 4 model, which takes like maybe 4
gigabytes, even running 128k context length, outperforms the biggest Granite
3 model. So we're reducing the memory footprint while improving
the size. We also were able to secure ISO 42001
certification for this family of models right before the release.
So Granite is now one of the first, if not
the first, open source models out on hugging face that
has ISO 42001 certification, showing just the degree of governance,
safety and security that we put into our AI model
development system. So I think this is great and I
think particularly with Kowtar and Kush, there's a couple other
angles I sort of wanted to bring into the Granite
four story. Kush, maybe we can pick up on this
last point, which I think is a really interesting one,
is, you know, I think one of the constant fears,
and I'll just be candid about Open source has always
been, well, does open source mean that people are just
going to be releasing models with not a whole lot
of governance, not a whole lot of safety? I know
one of the projects, and I think, Kate, you've talked
about it before, that the Granite team has been focused
on is kind of safety and compliance. And I guess
I'm curious if you want to give kind of a
sense of like how this is evolving in open source
generally. Is Granite kind of an outlier here or is
it kind of like sort of leading the way of
kind of like a whole trend of doing more of
this kind of work when we do New open source
releases. Yeah. So I think it's part of a trend,
but I think we're ahead. Right. So there's this. Not
that you're biased or anything, just a little bit. So,
yeah, the Stanford Transparency Index has been out for a
couple of years and it's been tracking, I mean, how
transparent are different models? Different actually processes for the model
building itself. Right. So with that, I mean, we've been
interacting with the Stanford team. That's been really great. They
should be coming out with their next leaderboard pretty soon
and hopefully we'll be doing well on that. So that,
I mean, shows the trend. This ISO 42001 is a
great sort of testament to the overall process. I mean,
the broader team has been undertaking. So I think all
of that is part of it. We also have been
cryptographically signing the models for the first time. So that's
a new feature, another type of transparency and other type
of verification. What does that get you? Actually, I actually
don't. This is the first time I'm hearing about something
like that. Yeah, yeah. So the idea is like when
you're training a model, there's checkpoints right along the way
and who knows what happened? No one can really know
unless there's some sort of ability to verify this thing.
And so you can actually have this cryptographic signing mechanism
in there and then you release those with the keys,
I guess, the cryptographic signature out and then someone can
go back and actually verify that. Yes, this is what
happened. This is the way that the training actually was
done. So yeah, all of that is amazing stuff and
yeah, really excited to see where this goes next. Katzer,
a question I had for you was Kate made a
crack about it a little bit earlier, but as someone
who's always kind of dreamed of owning a multi GPU
rig at home, it kind of seems like these small
models are getting really, really, really, really powerful. And I
guess I should just ask the question of. Just like
it kind of feels like you actually don't need a
whole lot of hardware to do some pretty incredible things
right now. And is that trend going to continue? It
kind of feels like, Tim, the idea that you would
want to buy a multi GPU rig at home is
kind of almost like an absurd thing to do now,
just given how great small models are. Yeah, I think
we have a great proof point here with the granite
models, tiny models. So while everyone else is racing to
make these bigger models, the frontier models, I think IBM
is fundamentally changing the question from how do we make
these models Bigger to how do we really make them
smarter per compute. And I think this is really very
important because the numbers that we see with granite, like
Kate mentioned really tell a great story. 72% almost, you
know, less memory, half the model size or even less
better performance, you know, even 4x or more longer contexts
and can run on consumer GPUs. I think this is
really great. This isn't about being competitive on benchmarks. This
is really about redefining what good means from most capable
to most efficient and also being capable. So like you're
saying, we probably don't need these huge multi GPU machines
especially for special use cases. Enterprise focused. I think efficiency
really matters and I think this is really strategic brilliance
from IBM looking at this trajectory of AI development, saying
that the trend that we're seeing is unsustainable. But when
we look at the business, business what it needs. Compute
costs are rising exponentially. Regulatory pressure is mounting. Enterprise customers
care about TCO more than the leaderboards. And also the
environmental costs are becoming impossible to externalize. So I think
that's really, I think it's a win story here that
we have with Granit 4 and also the architectural innovations
with this hybrid architecture. I think it's really a huge
leap forward because this entire architectural shift to Bamba is
a bet that state space models I think are the
future. Because transformers dominated here for a while. They scaled
well, but also they scale very expensively. So if we
really can deliver these comparable results with a much more
efficient architecture, I think it's a great story. Well, Kate,
to round out the segment and I think this is
probably the worst possible question to ask someone right after
a big launch is Granite 5. What can we expect
for Granite 5? It's too soon to start talking about
Granite 5, Tim. We're still really excited for Granite 4,
but there's a lot going to come next with Granite
Forest. So we've got models that will feature like thinking,
style, reasoning capabilities coming down the road. We've got smaller
models even than our tiny 3B model coming down the
road. So think like in the 100 millions of parameters.
So those are going to be really cool. Yep. Again
and we are going bigger. So we will be able
to take this efficient architecture and scale it and see
similar gains in efficiency, but deploy it at a bigger
size. So there's a lot of exciting work ahead and
we're really excited to do all of this work in
the open and share it with the broader community. Nice.
That's great. Well, we'll definitely keep an eye on that.
I'm going to move us on to our next topic.
And in some ways, this episode has become very model
heavy, actually, in some ways, because we're going to talk
a little bit about Claude Sonnet 4.5 and we're going
to talk about Sora 2. Right, the new OpenAI video
model. But let's talk a little bit about Sonnet 4.5
first dropped just earlier this week. And Kush, maybe I'll
start with you. I think one of the most interesting
things about this drop, I keep calling it a drop,
this release is that early on, I think these models
and foundation model companies often advertise themselves as jack of
all trades. We're going to launch a model and we're
going to show you benchmarks across everything you might want
to use it for. But this release is very coding
focused. Right? The whole blog post is sonnet 4.5. You
use it for coding. It's great at coding. Did we
remind you how much good it does at coding? And
this is a little bit weird. I mean, a few
weeks back we talked about this NDER paper that came
out where people looked at, say, what people actually use
ChatGPT for, and it turns out, like, coding is actually
like a really, really small segment of like the overall
use case for, you know, ChatGPT, which is like maybe
the most widely deployed service in the space. I guess
maybe a question to you, Kush, is like, why are
we really narrowing how we sell, talk about focus these
models? Because it seems very clear that Anthropic's making a
bet that what you really should think of these models
for is for coding. Yeah, it actually relates to what
Kautar was saying about granite. Right. What is granite useful
for those enterprise use cases where we can be very
specific. And if you don't have a user in mind
and you're creating a model, you're creating a system, then
you're kind of in a weird position that, like, what
is it really good for? So I think it's actually,
I mean, makes sense for Anthropic to pick a lane
and be like, this is what we're going for. And
coding happens to be the one that they've gone for.
We'll talk about the other models later on. And I
think they're starting to pick their lanes as well. So
from that point of view, I think it's good. And
yeah, I mean, the size being huge is maybe what
is needed for that kind of use case for the
coding assistant. And we've been experimenting and we're finding anthropics
models to be the ones that you do want to
go for, for coding. So they have the success there.
So they're building on it and keep going down that
route. Yeah, yeah, for sure. And I guess Katar, on
this front, you know, we're guilty of this ourselves at
moe, as we always say, like the foundation model companies
or like the AI companies. I guess this has me,
you know, Kush's response has me thinking a little bit
about like, does that even make sense anymore? Like I,
I don't know. Is OpenAI really competitive with anthropic over
time? If it turns out that they're going to be
kind of pointing their models at pretty radically different things?
You know, we used to talk about AGI, general intelligence.
It seems like we're headed to asi, which is like
specific intelligence, right? Or like super specific intelligence. A S
S I, you know. Yeah, I think it's really interesting
to see how all of these shifts are happening. Of
course, I think when we started the foundation models was
like one model that rules them all and then you
can specialize. But now you're seeing these shifts to like
now it's really important given the cause and all of
these things, the sustainability issues, we need to go the
route of the, the specific models. But I think it's
going to be a hybrid approach because the training strategy
that we're doing, this pre training, it still uses the
foundational models kind of principles, which is still going to
be important. It's just how do you guide these models?
What are efficient techniques that you can use to train,
pre train, but also to, you know, in the inference
scaling paradigm, which becomes also super important to guide these
models real time to do the right things at the
least possible cost possible. So I think those strategies are
becoming super important. Focusing more on what can I get
these models to do during inference as opposed to pre
training. Because I think the flexibility that we need to
augment these models with during inference is becoming also super
important, especially in dynamic environments where you don't have enough
data or you're seeing these new situations and you need
the model to be flexible, to adapt. But if we
talk about Claude the Sonnet 4.5, I really like this
developer focused strategy which is a developer first strategy which
gives Claude the focus to continue to innovate for the
software engineering tasks and so on. I think, which is
great. I think that this latest release, I feel it's
a big leap, especially with the 30 hours reasoning capability
they have. I think that's really huge. Kate, tucked away
at the bottom of the blog post for 4.5 was
one of the weirdest, coolest research demos it's been my
pleasure to see recently. It was a demo called Imagine
with Claude and it looks like a Claude interface, but
you would say, oh, I really want an app that
does this and it would in effect generate that on
the fly for you. And I had a lot of
fun creating an imaginary terminal that called an imaginary hugging
face for an imaginary model and having a conversation with
it. But it was just very fun to play with
and was like a very different interaction from chatbots. Right.
And I guess I'm kind of curious to get your
thoughts on what you thought about the demo and whether
or not what they're kind of showing off there might
really look like the future. I think kind of what
they're proposing is in the future you don't have software,
you just say what you want and then the software
kind of appears. Is that where we're headed? So I
mean I think Claude was very specific calling this Imagine
with Claude demo an experiment. It's certainly something that they're
throwing out there. You know, this idea that we're not
going to have this pre canned software. We're going to
start to just create software as we go about our
day to day and live our lives and oh, I
need to do xyz, let's create some custom software for
that. I think there's a lot of practicalities that need
to be solved before that becomes a reality. Let's put
it that way. Um, you know, platforms exist for a
reason. You know, investing in things at scale that many
people are going to be using work for a reason
versus custom. Coming up with one bespoke piece of software
for every single task is going to be inefficient for
every single thing. But I could see us getting to
a hybrid world where we start to empower folks. I
mean, is it that different from how people use things
like Airtable and kind of other lightweight no coding based
solutions to spin up a new database or kind of
website to help them do X, Y and Z? I
think it's just maybe a new riff on that, but
more powered with LLMs and maybe a little bit more
sophisticated. So ultimately you're a little skeptical. Yeah, I mean,
I think it's going to be like a personal kind
of day to day saver, not necessarily the new way
that enterprises run their business. Yeah, like I guess I
agree with you that I have a hard time imagining
like a bank is going to be like, well, let's
just like come up with some kind of payments processing
infrastructure on, on the fly. Well, this is a great
time, I think, to bring in a kind of third
model into the discussion. One of the big news stories
of the week, of course, has been OpenAI releasing its
latest Sora 2 model. And we've been thinking about it
very much in terms of models and talking about models
on this episode. But I think the right way to
think about Sora 2 is actually it's an app launch
more than anything else. Right. It's not just an incredible
kind of video generation model, but it also is like
this mobile first social experience that they're trying to create.
And I guess Kush maybe to toss it back to
you because you got the original version of this question.
It's like, it sure seems like OpenAI, you know, whereas
Anthropic really wants to focus on coding, OpenAI is thinking
about it very much as like a, almost like a
consumer, if not an entertainment use for this technology. And
do you think this kind of shows like that they
really are kind of seriously differentiating into that space? Yeah,
I think that's it. So yeah, Sora 2 is, I
think the Vibe video producing app. I mean the, the
Claude is the Vibe coding we have. I mean Vibe
thinking, Vibe, everything going on. I'm just Vibe living at
this point. Exactly, exactly. Right. And it's the same issue
though, right? I mean it's an app, it's fun, it
can be used for a bunch of things. But once
you get to like the serious end of things, that's
where I mean, you need to put in those extra
sort of processes. A lot of the, the great security,
the great robustness, I mean all of that sort of
thing applies no matter what you start vibing on. Right.
So I think in general in anything that you're producing,
I mean there's some initial provocation, some initial thinking, you
maybe get to a prototype and then you take that
prototype and make it into the eventual product. And yeah,
I think we're kind of closing the gap between those
two ends. And that's great because the speed of innovation,
the speed of production can happen, but then there still
needs to be that other side. I mean just the
provocative prototype isn't the product. So I think that's where
we still need to finish the job in the right
way. Kay, have you played with it yet? I'm kind
of curious if you've. I haven't played with it yet,
but maybe just building on that last question, I think
what's really interesting is Both Claude and OpenAI, anthropic rather,
and OpenAI with these model releases have really focused on
the application. I mean, a huge part of the Anthropic
release blog was focused on Claude code. Right. Versus just
the endpoints themselves. And so we see these frontier model
providers really focusing at a different layer at the stack
and I think that's going to continue and I really
think that the open source community needs to figure out
a way to compete at that same level because I
more versus just download the weights and take them off
and run with them. So I'd love to see more
work going on there. The things that I did see
with the Sora to release, I mean, can we talk
about the branding move, jiu Jitsu that they did, calling
deep fakes cameos, basically. So you can now take a
video of yourself and impose it into video or your
friends and you can share these deep fakes, let's call
them, for what they are of yourselves and cameo yourself
into videos. So. So I see a lot of concerning
issues with that as well as more broadly some of
the things they're working on. But it certainly is an
interesting update to the ecosystem. Yeah. So a lot to
run through there. I think. Maybe let's pick up on
that last point is obviously one of the kind of
responses to the Sora 2 release was in some ways
kind of like a collective horror around this technology. And
I think I heard two things from friends on the
social media chatter. Right. One of them was what you
were saying, which is they're doing deepfakes. They're just calling
it cameos. And then the second one was, have we
created the infinite slot machine? Are we just creating this
shallow video that's going to just keep us strapped to
our phone forever? On that last one, do you think
those risks are real? Do we feel like this is
going to be. This is really changing the nature of
content in a way that might be unhealthy. I think
there's significant risks and they are real. And I mean
OpenAI did try and address some of it in their
release. They talked about doom scrolling and things they're doing
to help prevent it. I think they have some sort
of recommendation filtering and algorithm that you can customize with
natural language. I don't see how that helps. If I
can tell you more specifically what I want to see,
I would think that improves. Yeah, yeah. But who knows,
Like I said, I haven't had a chance to try
And do we have the right safeguards in place to
drive important things like creativity and expression without kind of
enabling mass disinformation, mass slop, kind of reducing the human
experience of creativity? And the other thing I'd love to
get your comment on, just again, pulling from the granite
discussion, is I think you were making an argument that,
look, open source has been really focused on the model
and what you see all these big companies moving towards
is competing on interface. And I guess the kind of
question for you is do you think Open Source kind
of has the chops to go and compete at that
layer? I still remember what is it? LibreOffice was the
open Source office and it was great because I don't
know, it was open to free opener software, but interface
wise it. It had its challenges. And one argument is
maybe this has always been kind of a challenge for
Open Source. Do you think this time is maybe different
or is there similar kind of difficulties? Look, the world
runs on Linux and open source software and code all
the time while putting your own user experience on front
of it. So I think we need to get to
a similar paradigm with open source AI where we're enabling
the community to engage more at the kind of application
framework level versus just the pure model weights. And then
that empowers individual companies, individuals to go out and create
their own versions that are going to power their businesses
and their lives. So I think it's going to be
a mix. I think we see this pattern in open
source software development all the time and there's no reason,
I mean, this is one part of the stack that
doesn't require hundreds of GPUs burning in order to contribute.
So we actually can engage in a more meaningful way
with I think a lot of the open source developer
community than you can in the training portion of the
kind of stack itself. So I see there's a ton
of opportunity there if we can kind of coalesce around
some of these broader patterns and applications that we think
we're going to need to drive success. Yeah, for sure.
Kaltar, maybe a final question on this. I was once
standing in a data center when someone started like a
fine tuning run and you hear all the GPUs turn
on and the room gets really hot really quickly. And
I think a little bit about the amount of compute
that goes into powering a mass Rollout of a video
generation model and OpenAI can't be making money on this.
This is an enormous increase in their burn rate to
offer it at these prices, I guess, I suppose is
the way to think about it. But how sustainable is
this? Right. Do we think that in four or five
years you will have access to these kinds of models
at these prices or is it just kind of like
this is a demo for the moment to show off
the technology and the business model is ultimately going to
be. You're going to pay a lot more for getting
access to this kind of thing. Yeah, I don't think
they probably. Have. A good answer to all of these
questions. These companies are thinking about these things. It's of
course there's a lot of work to work on next
generation AI hardware with these technologies like neuromorphic and in
memory computing and 3D integration, packaging lots. IBM also is
doing tons of innovation in that space. So that has
to continue to drive more efficiency all the way down
to the silica level and even invent or innovate in
other technologies like phase change memory or others where you're
combining both compute and memory together. Those are crucial to
continue to advance. But if we just want to look
at these maybe two profiles, the Sora and the Sonnets.
So the recent AI update to the tools like take
an hour. So this is some recent statistics. So for
Sora too like the inference cost is extremely high per
generation and the use frequency, it's episodic here. So users
creates few videos and the compute pattern you are seeing
massive bursts could be infrequent depending on the usage. But
if there is mass adoption of these things, you can
imagine the pressure this puts on computes and the energy.
If I look at Claude. So in their pricing it
remains the same as Claude 4 which is about like
$3 to $15 per million tokens. And they can run
autonomously for 30 hours on complex multi step tasks. So
in terms of the inference cost, it's you know, compared
to the video one, it's moderate per token but it's
continuous. It, it runs for hours and days. So it's
like if you look at Sora, it's like a sports
car, incredible power in short bursts, expensive per mile, while
cloth 4.5 it's maybe like a semi truck, moderate power
but runs 24 hours. It's like a hauling cargo here.
So of course there are implications here and I think
we need to look at this holistically. What is the
cost? How sustainable is this? I think right now we're
in the phase of proving the technology, proving the capabilities,
but to do this in massive production, get that scale,
there is a lot of more work to be done
to optimize that stack. Yeah, it makes me think a
little bit too. We've talked about inference, but what's also
quite unique here is that like, text is like really
cheap to store. Right. But like, if this goes well,
you're talking about like YouTube levels of just storage that
you need to do, which is like quite a different
thing altogether. Also, like in, in some ways, like there's
the inference cost, but increasingly also just like the storage
cost becomes really expensive. If the idea is we're going
to just store whatever video you created indefinitely. So store
and then maybe train on it. Who knows? Yeah, exactly.
Right. I mean, ultimately, yeah, there's the. There's the train
element as well. So we'll have to see how this
all unfolds. I'll move us on to our kind of
like, final topic of the day, which is another release
from OpenAI, but I think it's worth raising in kind
of the context of the discussion we've been having so
far today. Right. I think we've talked a lot about
what's happening in open source, how the big kind of
foundation model companies are maybe differentiating with time. And this
one seems to be maybe like another kind of indicator
of maybe how differentiated some of these companies are becoming.
So basically, OpenAI, they released a thing called Buy with
ChatGPT, which is the idea that ultimately GPT will be
able to be kind of an E commerce agent for
week, which is the Google payment protocol. It sounds like
OpenAI, not to be left behind, is also announcing its
own sort of agentic payment protocol. But let me kind
of just go to a very basic question, is like,
do you think there's enough trust in products like ChatGPT
to have them do purchases on your behalf? I guess
part of me is always wondering at what point do
I feel comfortable giving agents access to my wallet? And
that feels like a big question for this market. I
mean, I'm not going to give ChatGPT access to my
Fidelity account, but maybe my credit card where I can
refute a charge that was made that I don't agree.
Yeah, it's like, it's kind of like, it's Working up
to some parts of your payments. Yeah, exactly. So I
think there's plenty of trust for OpenAI to try and
optimize the user experience on purchasing. They've got all the
incentives in place. I do not trust OpenAI with, you
know, my personal conversations and how they handle all sorts
of mental health issues and sora to deep fakes and
everything else. But I think their incentives are pretty well
aligned to kind of user needs when it comes to
pushing dollars through their platform. So I, I think it
makes sense. They're clearly targeting kind of wide consumer reach
and audience. And yeah, I think there probably will be
plenty of trust there for giving ChatGPT your credit card
and see what happens now. Will the banks trust? It
would be interesting to see how the credit card companies
work out refuted transactions that ChatGPT made. And does that
count as a fraudulent transaction? X, Y and Z. That
could be really interesting to see that play out a
little bit. Yeah, I think the contracting element of this
is really interesting. I also love, kind of, I'm getting
a vibe of like Kate's like kind of frenemies approach
to these companies where it's like, well, here the incentives
are aligned. I have no problem with it. Here the
incentives seem really disaligned. I feel bad about it. Yeah,
there you go. Well, actually on incentives, I mean, Kush,
one of the maybe conspiracy theories that came up around
this launch was, well, the minute you start talking purchases,
you start talking ads ultimately, right? Which is like, well,
you have an agent that's going to buy stuff on
your behalf. Wouldn't someone pay to be the product that
ChatGPT recommends? Is that where we're headed with some of
these products like that? Like ads is going to be,
you know, everything old is new again and maybe ChatGPT
really is kind of just the new Google. Do you
think that's where we're going to end up? Not sure
about that actually. Because like if we look historically, things
like MPESA in East Africa came about so you could
do purchases through your phone and have your mobile wallets
and these sort of things. There's this whole thing of
UPI in India and like kind of the middleware in
some sense for the purchasing of things. And once you
get ads in there, I mean it just isn't the
thing. I think really it's getting money to flow through
your system and you, I mean, take a little bit
of a cut somehow somewhere. And I think that's the
bigger story here. And because ads can come in, they
can, sure. I mean do something or the other. But
what really makes the world go around is where does
the money flow? If you can capture that, then you're
really golden. And I think government regulation needs to step
in very quickly on this because this is a very
critical infrastructure sort of piece for not just individual countries
but for the global finance system. And I think if
we just let this go without having a public sort
of facing point of view on this, then it'll be
a little bit of a challenge getting out of it
because once there's kind of corporate capture of these sort
of things, the infrastructure is in place and then you
can't really undo it. So that's kind of my biggest
concern on this. Yeah, and I think regulatory aspect of
this is going to get very interesting. It's just like
how do you step in? What are the rules that
you need to put in place? So maybe a final
question, we'll kind of close out Kautar, I'll give you
the last word is imagine I'm the CEO of Amazon,
I'm on my yacht sipping my martinis or whatever that
the CEO of Amazon does. Am I worried by where
this is all going as kind of like the Internet's
prominent e commerce provider that has kind of sold everything
now, right? Like they like do groceries, they do books,
they do everything, do developments like this. Like are they
a threat to companies like Amazon? I would be worried.
You know, I think OpenAI here is trying to differentiate
itself and trying also to have a big play in
the agent e commerce. So they're I think trying to
ship fast, all the experience build also the protocol around
what actually works. They're using this agent ACP protocol, the
agent agent e commerce protocol and of course right now
it's Stripe centric but theoretically it's open to others. They
open source the code that lets other merchants use also
their interface and so on so they can open up
also for other merchants. So I think they're trying to
bet on the first mover adventures in agentic commerce. And
for them I think it matters more, I think than
maybe perfect interoperability, which Google Play is more focused on.
So they're willing to kind of accept somewhat centralized solution
like Stripe as the hub in exchange of velocity, really
getting this to work fast. If you look at what
Google's play, they're trying to build a consortium first here,
get the buy in from 60 more partners to ensure
true interoperability using the AP2 protocol, which is more ambitious
whoever ships first. So I think Open's AI theory here
is trying to own the user experience, control the transaction,
kind of become the front door to commerce, which is,
of course, a threat to Amazon, while Google's theory is
setting the standards, be the protocol player here, but don't
own any one experience. But if you look at Anthropis
theory, be kind of the best tool that developers and
businesses can use to build their own experiences. So I
think everybody is trying to bet on one strategy, one
angle, and we'll have to see how all of these
things play together. But it's interesting to watch. Kate Kush
Kowtar. This is one of my favorite panels. Hopefully we'll
have everybody back on real soon, but that is all
the time that we have for today. And next up,
we're going to have a quick segment with Matt Kaczynski
to do a cybersecurity segment. Well, Matt, we're really glad
to have you on the show for our listeners. Matt
Kaczynski is the host of Security Intelligence, a new podcast
that has just launched focusing on cybersecurity questions. And we
want to have you on the show because it is
national, I believe, Cybersecurity Month. That's correct. And I guess,
Matt, just to maybe, like, kick off the discussion, maybe
just to riff on that last segment a little bit,
strikes me that once we start using AIs for payments,
it's going to get hacked super quickly. And so I'm
kind of curious if you want to give, like, a
little bit of a security gloss on that last discussion.
Like, how is that space evolving? Are we all screwed?
You know, I just, like, want to learn a little
bit about that because certainly people are going to be
using these products. That means they're going to get burned
using these products, too, right? Absolutely. Yeah. So first off,
thanks for having me here, Tim. First time, long time.
But yeah, you know, it's interesting, right? Because the last
episode we did of Security Intelligence this week, I asked
do for you? And what was really interesting to me
was the first answer I got from, from Jeff Croom,
a distinguished inventor here at IBM, said, anything right for
Jeff's money. Right now, he feels like it's too early
in the game to be connecting this stuff to anything,
really. And the other panelists kind of agreed because here's
the interesting thing about the AI agents and the way
that they're different from some of the other security challenges
we've faced in the past test is that you're not
hacking these things the same way you would say, like,
I don't know, a piece of software, your traditional piece
of software. Right. You're not exploiting some kind of bug
in the code. You're not dropping a payload or writing
a script. You're basically social engineering these things. Right. If
you say the right words to these things, you can
get them to do some, some pretty malicious stuff. And
so the, the question that raises for cybersecurity experts today
is, okay, we don't really know how to stop people
from getting socially engineered. How do you stop an AI
from getting socially engineered? And that's the big question, right?
That's what people are trying to figure out in the
space today. Yeah. And I think it makes me think
a little bit about, I mean, so these are papers
I'm kind of obsessed with in the AI space, which
are like, oh, if you are encouraging to your AI,
it performs better. You tell it, it's really important to
my job for you to get this right and the
AI does better. And it's kind of interesting that you
use the phrase kind of social engineering because the way
we try to get humans to be better at social
engineering is we literally show them like documentation where you
should ask a question, if someone is asking for your
password. Do we need to do that as fine tuning
for our models? Do we train them to get better
at social engineering using the exact same methods that we
use to get humans to be better at social engineering?
I think that we kind of do, actually. Right. And
it's a question of figuring out what that education looks
like. Again, going back to the conversation that I had
with the panelists on our episode this week, they all
kind of landed at what we need is some kind
of real world version of Asimov's three laws of robotics.
Right. But for teaching, these are our AI agents and
whatnot to avoid or maybe better detect social engineering with
again, the caveat that all this education we've put into
people, it still hasn't stopped it. People still get scammed
and they probably will get scammed forever. So in some
ways it's like, look, we may never stop the AIs
from getting scammed, but if we can set up some
kind of standard universal approach to telling them what to
watch out for, maybe we can cut down on it
a little bit more. Like we cut down on it
with people. That makes a lot of sense. Well, I
Think tell us a little bit more about the new
show. So I understand that it launched fairly recently. And
what will you guys be focusing on? We've been talking
AI just because Moe is like AI pilled. But it
sounds like cyber security might be a much broader focus
for you guys. Absolutely. So, yeah, we basically, to be
frank, we ripped off your format for Moe. We do
a. We get a. We get a week. Hey, you
guys, you hit something good. We just took it. So
we get a panel going every week, three experts and
myself, and we sit down and we break down the
latest stories in cybersecurity. And now granted, a lot of
it does have to do with AI because this is
we also cover a ton of other stuff too, right?
You know, your DDoS attacks, your, you know, new app
vulnerabilities, big hacks, that kind of stuff. And then we've
also got, coming up soon, some pretty special one on
one in depth interview episodes with some experts that are
more, a little more narratively focused. That's going to be
a kind of bonus thing we do. But yeah, that's
the kind of gist of the show. Nice. That's great.
Well, if people want to find out more about it,
where do they find you? Where do they find the
show? Absolutely. Head over to the IBM YouTube technology. IBM
Technology YouTube channel. And then of course, we're also found
wherever podcasts are hosted. Security Intelligence is the name of
it. Go type that in, search it out. You'll find
us. Nice. Well, Matt, we'll have you back on Moe
sometime. And thanks for joining us today. I would love
to. Thank you, Tim. And that's all the time that
we have for today. If you enjoyed the episode, you
can get us on Apple Podcasts, Spotify and podcast platforms
everywhere. And we'll see you next week on Mixture of
Experts. Experts.