NVIDIA GTC Unveils Robot AI Breakthroughs
Key Points
- NVIDIA’s GTC spotlighted the **Groot N1 foundation model**, a humanoid‑robotics AI trained on both synthetic and real data that uses a dual “fast‑and‑slow” architecture inspired by human cognition, positioning it as a step toward AGI‑level robotics.
- The **Newton Physics Engine** was announced for real‑time physics simulation, enabling more accurate and AI‑driven robotic interaction with virtual environments.
- A new **synthetic‑data generation framework** for robots was unveiled, addressing a major bottleneck by providing scalable training data to boost robot performance across applications.
- Experts — Vyoma Gajjar, Kaoutar El Maghraoui, and Nathalie Baracaldo — concurred that the announcements collectively signal a strong, robot‑focused direction for NVIDIA’s AI roadmap.
- While the show also previewed other AI headlines (Baidu’s models, a paper on Chain‑of‑Thought flaws, and Gemini 2.0 Flash), the dominant theme of the episode was NVIDIA’s robotics breakthroughs announced at GTC.
Sections
- Robotics Highlights from NVIDIA GTC - Guests discuss NVIDIA GTC’s robot‑focused announcements, including the Groot N1 generalist model, the Newton real‑time physics engine, and a synthetic data framework to boost robot performance.
- Simulating Robotics for Safe Deployment - The speaker explains that transferring machine‑learning code to physical robots frequently causes unpredictable failures, prompting the use of synthetic data and extensive simulated environments to test, accelerate development, and ensure safety before real‑world deployment.
- High‑Fidelity Physics Engine for Robotics - The speaker explains that a new physics engine, built on a RAP acceleration framework, provides real‑time, GPU‑accelerated, high‑fidelity simulations and integrates with AI, reinforcement‑learning, and DeepMind robotics tools, enabling precise virtual training and testing of robots before real‑world deployment.
- Balancing Inference and Training Resources - A researcher inquires about allocating compute between inference and training, wondering if solutions like DGX Spark can help, and receives a response emphasizing its role in expanding AI research access.
- Desktop AI for Robotics Education - The speaker advocates democratizing AI by delivering supercomputing‑grade, robot‑training capabilities to desktop devices, enabling students to experiment with fine‑tuned models locally and expanding the educational market similar to early Apple school deployments.
- Baidu's AI Platform Strategy - Discussion about Baidu's integrated AI model platform, pricing, open‑source dynamics, and competitive positioning versus rivals like Veeam.
- Baidu's Shift Toward Open‑Source AI - The speakers debate Baidu’s closed‑source approach, highlight security and ecosystem benefits of open sourcing, and note its June announcement to release new models to compete with OpenAI and DeepSeek.
- Chain-of-Thought Bias and Security - The speakers debate how chain‑of‑thought explanations can mislead about AI decision‑making, raising security concerns and showing that these traces inherit the same cognitive biases observed in humans.
- Debating the Future of CoT - The speaker argues that chain‑of‑thought prompting is here to stay, explores ways to improve its reasoning—including a “reverse CoT” validation—and illustrates the concept with a personal example about selecting a sofa.
- Addressing Bias and Chain‑of‑Thought Errors - The speaker explains that AI inherits human biases and accumulates reasoning mistakes in long chain‑of‑thought processes, arguing for interpretability and solutions such as self‑correction modules, constitutional AI, and neuro‑symbolic approaches like tree‑of‑thoughts.
- Prompt Engineering, Localization, and Google’s AI Push - The speaker outlines ways to make prompt engineering more robust, scalable, and culturally customized—while noting inherent bias trade‑offs—and critiques Google’s rapid AI catch‑up, highlighting the recent Gemini 2.0 Flash Experimental image model release.
- Google Leverages Multimodal Knowledge - The speaker explains that Google integrates image data with its extensive text and search history to create more accurate domain‑specific AI models, viewing this capability as a current industry baseline rather than a unique catch‑up advantage.
- Innovation vs Catch‑Up Debate - The hosts critique whether guests bring truly novel ideas or merely follow trends, audience‑engage by urging listeners to suggest future topics, and close the episode with show promotions.
Full Transcript
# NVIDIA GTC Unveils Robot AI Breakthroughs **Source:** [https://www.youtube.com/watch?v=TsDdk7xHhMY](https://www.youtube.com/watch?v=TsDdk7xHhMY) **Duration:** 00:39:09 ## Summary - NVIDIA’s GTC spotlighted the **Groot N1 foundation model**, a humanoid‑robotics AI trained on both synthetic and real data that uses a dual “fast‑and‑slow” architecture inspired by human cognition, positioning it as a step toward AGI‑level robotics. - The **Newton Physics Engine** was announced for real‑time physics simulation, enabling more accurate and AI‑driven robotic interaction with virtual environments. - A new **synthetic‑data generation framework** for robots was unveiled, addressing a major bottleneck by providing scalable training data to boost robot performance across applications. - Experts — Vyoma Gajjar, Kaoutar El Maghraoui, and Nathalie Baracaldo — concurred that the announcements collectively signal a strong, robot‑focused direction for NVIDIA’s AI roadmap. - While the show also previewed other AI headlines (Baidu’s models, a paper on Chain‑of‑Thought flaws, and Gemini 2.0 Flash), the dominant theme of the episode was NVIDIA’s robotics breakthroughs announced at GTC. ## Sections - [00:00:00](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=0s) **Robotics Highlights from NVIDIA GTC** - Guests discuss NVIDIA GTC’s robot‑focused announcements, including the Groot N1 generalist model, the Newton real‑time physics engine, and a synthetic data framework to boost robot performance. - [00:03:18](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=198s) **Simulating Robotics for Safe Deployment** - The speaker explains that transferring machine‑learning code to physical robots frequently causes unpredictable failures, prompting the use of synthetic data and extensive simulated environments to test, accelerate development, and ensure safety before real‑world deployment. - [00:06:31](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=391s) **High‑Fidelity Physics Engine for Robotics** - The speaker explains that a new physics engine, built on a RAP acceleration framework, provides real‑time, GPU‑accelerated, high‑fidelity simulations and integrates with AI, reinforcement‑learning, and DeepMind robotics tools, enabling precise virtual training and testing of robots before real‑world deployment. - [00:09:41](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=581s) **Balancing Inference and Training Resources** - A researcher inquires about allocating compute between inference and training, wondering if solutions like DGX Spark can help, and receives a response emphasizing its role in expanding AI research access. - [00:12:47](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=767s) **Desktop AI for Robotics Education** - The speaker advocates democratizing AI by delivering supercomputing‑grade, robot‑training capabilities to desktop devices, enabling students to experiment with fine‑tuned models locally and expanding the educational market similar to early Apple school deployments. - [00:16:03](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=963s) **Baidu's AI Platform Strategy** - Discussion about Baidu's integrated AI model platform, pricing, open‑source dynamics, and competitive positioning versus rivals like Veeam. - [00:19:11](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=1151s) **Baidu's Shift Toward Open‑Source AI** - The speakers debate Baidu’s closed‑source approach, highlight security and ecosystem benefits of open sourcing, and note its June announcement to release new models to compete with OpenAI and DeepSeek. - [00:22:24](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=1344s) **Chain-of-Thought Bias and Security** - The speakers debate how chain‑of‑thought explanations can mislead about AI decision‑making, raising security concerns and showing that these traces inherit the same cognitive biases observed in humans. - [00:25:26](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=1526s) **Debating the Future of CoT** - The speaker argues that chain‑of‑thought prompting is here to stay, explores ways to improve its reasoning—including a “reverse CoT” validation—and illustrates the concept with a personal example about selecting a sofa. - [00:28:33](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=1713s) **Addressing Bias and Chain‑of‑Thought Errors** - The speaker explains that AI inherits human biases and accumulates reasoning mistakes in long chain‑of‑thought processes, arguing for interpretability and solutions such as self‑correction modules, constitutional AI, and neuro‑symbolic approaches like tree‑of‑thoughts. - [00:31:39](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=1899s) **Prompt Engineering, Localization, and Google’s AI Push** - The speaker outlines ways to make prompt engineering more robust, scalable, and culturally customized—while noting inherent bias trade‑offs—and critiques Google’s rapid AI catch‑up, highlighting the recent Gemini 2.0 Flash Experimental image model release. - [00:34:51](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=2091s) **Google Leverages Multimodal Knowledge** - The speaker explains that Google integrates image data with its extensive text and search history to create more accurate domain‑specific AI models, viewing this capability as a current industry baseline rather than a unique catch‑up advantage. - [00:37:54](https://www.youtube.com/watch?v=TsDdk7xHhMY&t=2274s) **Innovation vs Catch‑Up Debate** - The hosts critique whether guests bring truly novel ideas or merely follow trends, audience‑engage by urging listeners to suggest future topics, and close the episode with show promotions. ## Full Transcript
What's the announcement you're most excited about from NVIDIA GTC?
Vyoma Gajjar is an AI
Technical Solutions Architect.
Vyoma, welcome back to the show.
Uh, what did you think?
Thank you.
And I feel the Groot N1 model, the generalist model that they're
calling it for humanoid robotics was something that I really
enjoyed. Kaoutar El Maghraoui is a principal
Research Scientist and Manager at the AI hardware
center.
Uh, Kaoutar, welcome
back to the show.
Uh, what did you see from the keynote that you liked?
Thank you.
Great to be here.
I was also very excited about the robotics and simulation, uh, announcement,
especially the Newton Physics Engine for real time physics simulation
and how, you know, it's, uh, working
with the AI.
Nathalie Baracaldo is a Senior
Research Scientist and Master
Inventor.
Uh, Nathalie, welcome
back to the show.
We haven't seen you for a while.
Um, and, uh, what did you like from GTC?
I
was super excited with their framework to generate synthetic data for robots.
Uh, because that has been a key, key factor for reducing the performance of
robots in all sorts of, uh, applications.
So super excited about that.
I guess we're all into robots here.
Absolutely.
All that and more on today's Mixture of Experts.
I'm Tim Hwang and welcome
to Mixture of Experts.
Each week, MOE brings together the best minds in artificial
intelligence to walk you through the biggest headlines of the week.
As always, there's a lot to cover.
We're going to talk about Baidu's new models that they've
dropped, a paper about the
flaws of Chain-of-Thought of Thought
reasoning, and
Gemini 2.0
Flash Experimental.
But first, I really want to cover NVIDIA GTC.
GTC is NVIDIA's sort of big Conference that they do every year.
It's where the big drops happen.
Uh, you
know, Jensen, Huang gets to walk
out on stage and, and do all the exciting keynotes.
Um, sounds like this group really wants to talk about robots and
specifically Groot N1 which
is a foundation model for robots that NVIDIA announced during the
keynote, uh, Vyoma
maybe I'll
start with you.
What got you so excited about this announcement?
Um, one of the things that I saw is like a
model such as Groot N1 created
by NVIDIA, which is trained on both the synthetic and the real data and,
uh, the NVIDIA, NVIDIA were during that keynote, they were claiming that it
features like a dual system architecture.
So it's thinking fast and slow, which is kind of inspired by that human
cognitive processes that we see.
So I feel we are going to get, these are the small, small ways in which
people are trying to get towards AGI.
Like, let's get a little bit closer, let's get a little bit
closer, however far it seems.
So I feel that was a good, um, catch that they were trying
to do in that, uh, part, yeah.
Yeah, absolutely.
And I know, Nathalie, in your
response, you kind of flagged this kind of synthetic data part being the
thing that got you the most excited.
Um, you know, I know that's been a little bit of a blocker, but it would
be good for our listeners to kind of understand how big of a blocker it
has been sort of traditionally if you want to talk a little bit to that.
Yes, definitely.
So one of the big issues that we have is that when you are trying to simulate a
robot to test it before it goes into the real environment, we have limited data
and traditionally what has happened is that when you simulate, which is less
expensive, you don't have the exact kind of a spectrum of different types of, uh,
of scenarios where the robot might move.
And as a result, when you move your machine learning, uh, programming
into the actual robot, it fails.
And, uh, there are like, uh, these very nice, uh, videos of how it fails.
So if you have a humanoid, it may just fall on their face and it's just crazy.
So that's why, uh, a lot of, uh, the different, uh, robots at companies
and, and actual factories, they have a very restricted set of environments.
Do you see them going down an app, for example, if it's an arm and so forth.
And it's just because it's very, very complex to create a robot that can
move in an environment that may not be exactly the wine it was designed for.
So just moving a little bit, uh, uh, a millimeter or something the
robot may not behave as properly.
And having this.
type of synthetic generation of data allow us to basically create a
huge environment where we can test this and make the whole development
cycle much faster and much safer.
So another aspect to it is that because these robots, as they
evolve more, they move around.
You may have situations where you have unknown safety.
things happening.
And that is super interesting to me because understanding how we can make all
those environments really safe and try to simulate things that go wrong before the,
the, uh, robot actually gets deployed.
Uh, I think it's just fascinating.
It opens a lot of different, uh, new, uh, opportunities to create
safe robots and safe applications and deploy them in real life.
So I am super excited when I, when they said also they open source it.
I, I was very, very excited to hear that.
Yeah.
The additional kind of open source element, I think is like a
really interesting part of this.
Cause they've clearly created like something that's like a big deal
from model standpoint, but they're just saying, actually we're, we're
here to sell hardware, right?
So we actually have like ways of, uh, the, the business incentives
lean towards things like open
source.
Um, Kaoutar, you spend
your days thinking.
All about all things hardware, um, how big of a deal is this?
And why is NVIDIA getting into robots?
You know, like I, I think about NVIDIA, like they started as
like a, a gaming GPU company.
Um, and then, you know, like the next time we really thought about them, it
was like, oh man, we're going to do these big data centers for language models.
And that kind of sounds like a lot of this keynote was robots, robots, robots, right?
We're going to show you videos of robots.
We're going to bring a robot onto the stage.
Um, Why is, why is NVIDIA kind of investing in this, this vertical?
I think it's, it's high time right now to invest in this,
and this is very attractive.
I think all the ingredients right now are coming together.
You know, the, the models, the hardware, the simulations, the
synthetic data generations all are coming together, which makes these
robots really perform very well.
So I think the collaboration that they have with.
DeepMind and Disney, uh, you know, I also was interested in seeing Disney
also play a role here and especially if they're going to bring, you know,
that maybe, um, uh, the fun, the entertainment piece of it, uh, you know,
their Disney characters or, you know, kind of play that into these robots.
That's going to be really interesting.
Uh, so one thing also that was very interesting is this physics
engine that they talked about, which is designed for these robotic
simulation.
Nathalie mentioned
this and another thing it's also built on their wrap framework, which
provides a lot of acceleration.
So it provides, you know, this high fidelity and real time
analytics simulations, which was not, you know, kind of.
possible or realistic before.
And this is very crucial for training and testing these robotic systems
in virtual environments before even deploying them in real world.
So I think that is a very big step forward to enable these
humanized robots to perform well.
and with high fidelity.
So, uh, combining basically the simulation and the, the AI acceleration using
their, this RAP based acceleration framework that they have with high
performance parallel programming, it helps them achieve, you know, fast and
efficient GPU accelerated simulations.
So they're kind of combining the AI world with the physical physics based
simulations to provide, you know, this, uh, interesting, uh, outcome.
And they also have these integrations with their existing frameworks
like the, uh, the S Hack lab, the, with their reinforcement learning.
And I think they also have a playground, uh, that uses deep minds
robotics research, a lot of, you know, integration with existing frameworks.
And, uh, so that's really makes, uh, this high precision robotic control
possible and paving the, this, um, you know, kind of a great environment
that is ideal for simulating tasks such as, you know, manipulation,
grasping, multimodality, et cetera.
Yeah. And I guess a follow up question for
you there, Kaoutar, is, um,
you know, in the last few episodes, I feel like every few episodes we do a segment
where it's like, oh, but you know, open AI is we're about to work on its own ship
or, you know, Amazon might be catching up.
You know, there's lots of people who kind of want to like.
You know, capture some of NVIDIA's market, but you know, I kind of look
at all of this robotics work they're doing and also just the announcements
about like Blackwell Dynamo, right, it's just like the performance
metrics are just like insane.
Um, I guess from your opinion, kind of as someone who thinks
about this a lot and watches the industry, like, can anyone catch up?
Like, it kind of just feels like after this keynote, it's like, feels
like very hard for anyone to really credibly claim that they're going to
kind of like, do things kind of on par with NVIDIA, particularly because
they have this ecosystem, but curious about how you think about that.
Yeah, I agree with you.
I think it is, they're kind of creating this big gap.
Uh, and they're also lining up the right collaborators.
like DeepMind and Disney and others.
So it's going to be hard to catch up, but I wouldn't be surprised if somebody
comes with some contributions, like
either from OpenAI or I think
we'll have to see, although I agree with you, it's really difficult to catch up.
Um, I want to look.
talk a little bit about some of the other announcements
that were done at the keynote.
And,
you know, Nathalie, in particular,
I kind of thought of you.
So, you know, a few episodes ago, we talked a little bit about the project they
announced, uh, I think at one of the last keynotes called Digits, which was this
like, candidly, like quite cute little supercomputer that they were selling,
uh, that would be sort of like a desktop.
Um, and, um, I'm kind of curious as like someone who's a researcher, you
know, if like that kind of form factor for doing work is interesting to you.
Um, I just think a little bit about, I have a friend at a company being
like, our product's getting really successful, but that means we're burning
all of our compute on like inference.
We have no time to do any like training or fine tuning work anymore.
Um, and there's been kind of this like tension of like, oh, well.
All of the compute resources for an organization kind of
come out of the same bucket.
Um, and I guess I'm kind of curious, like, does something like DGX Spark, which
is what it's called now, like, is that, is that something that's interesting?
I don't know if you put your name in to kind of reserve one of these devices, but
I'm curious about how you think about it.
I would pass that
question to Kaoutar actually.
I am not sure how to answer that.
Yeah, I think definitely, you know, uh, it's gonna open up the doors for
many, uh, researchers and enthusiasts and people who are interested in
learning about, you know, all of these different cycles in the AI journey.
So, of course, there is a lot of focus on the inferencing and inference scaling,
uh, because You know, I think the need for that stemmed from the fact that it's
very hard to get access to these GPUs.
So we had to come up and be creative about, you know, what can we do with
the resources that we have available.
But I feel like there is a lot that can be done even, you know, in the pre
training and the fine tuning stages.
It's just because only a few people or few organizations are really limited
because of this, uh, the, the resource constraints that we have right now.
So I would love to have access, you know, to AI in a box that I can use
in my, in my home and experiment with all these different things and then.
push the boundaries even further and I'm sure many others would
have that appetite as well.
Yeah, I think it's just kind of like a cool, I mean, I guess I'm
a little bit of a device nerd.
I was just like, oh, it's just like amazing that you can have
that much computing power just like on your desktop right now.
And yeah, I think it's like very, very
exciting.
Um, Vyoma, any thoughts
on DGX Spark?
I don't know if you like had seen that announcement.
Yeah.
If you think it's more of a gimmick or it's like the kind of thing you'd actually
be interested in playing around with.
I, I, I feel, uh, the reason why they came up with this also is to
target the developer community.
That the developers sitting at home or even want to try
something as a side project.
Because that's how innovation kind of flows.
Someone's side project, if I have the compute to do it, I,
it opens my creativity, mind doors, if you may call that.
So it, it helps you, um, experiment.
Train like let's even if you want to find you in a small little thing and move on
so feel fast kind of works query very well on this and I feel that will be the
reason why people want to adapt more that.
Okay, I have this.
I can leverage this.
Learn about it.
Try seeing if this works or not.
Let's move on.
So no longer are you going to see big companies or big institutions spending
a lot of time and energy on like innovation projects because someone
somewhere would have tried it and be like, Hey, guys, it's not gonna work.
Let's move on.
So I feel that a quick turnaround is something that is the angle that
NVIDIA is trying to play here as well.
If
I might add here, I think, of course, you know, all the
points that Vyoma said are,
are great.
So this is kind of bringing AI supercomputing to the desktop,
trying also to democratize AI.
And, but I think there is also this angle of the robotics, the humanized AI
training at scale, which I think that was also one of their motivation, is,
uh, basically, pushing these humanized robotics forward with models like the
Groot and why DGX here, uh, matters.
It allows, you know, developers to fine tune and deploy these
robotics models locally.
So, uh, using, you know, their Newton Physics Engine, and it
allows also this enabling sim- simulation to real time training.
So you need, you know, these capabilities locally to be able
to, to advance these things.
And I think it would be also great for students.
that are learning AI, they need to learn all these concepts and the
best way to learn is to experiment and have these hands on experiences.
So, uh, students right now are struggling to get access to to GPUs and resources.
Yeah, it'll be really cool if there's like kind of a big schools or education
kind of market for these types of devices.
I just think about when like the You know, the Macintosh or like the kind
of the first iMac laptops came out.
Um, it was like a huge thing, like Apple had a huge market selling to
schools because everybody wanted to give computers to their kids.
Um, and, uh, and it was like a great way to kind of break into people learning
how to, you know, like use these devices.
So I'm gonna move us on to our, uh, next topic.
Um, Baidu, uh, announced, uh, this week that they were launching two new models.
One's called ERNIE X one, and the other one is ERNIE 4.5.
Um, and x1 is supposed
to be the,
their DeepSeek competitor.
Um, and of course Baidu is like a longstanding, you
know, Chinese tech company.
Really one of the leaders in the space.
You know, I think in many ways, kind of like one of the people that you would've
expected to kind of really dominate.
uh, in AI.
Um, and almost like a lot of other players in the space, like OpenAI,
you know, it's like they too are now kind of struggling with all of these
new competitors coming up, right?
Like, um, you know, what's interesting
about ERNIE x1 and ERNIE 4.5 is they're
both closed sourced models.
Um, and so I guess maybe as a first cut, like curious about you know, how
folks think a little bit about sort of open source here, and I guess why we
think Baidu is still trying to pursue like a closed source, uh, strategy.
Um, you know, I'm kind of curious if you have any thoughts on like why they're
still playing this game and if, you know, you really think ultimately they're going
to have to open source just like, uh, just like many others are thinking about now.
Yeah.
Um, I feel Baidu is the kind of company which was, which stemmed,
its origin stemmed from the point that they wanted to create a search
engine for China and they wanted to keep majority of that data private, a
lot of data privacy, um, inhibitions that they were going through as well.
So I've.
feel this is like their chance to kind of utilize some of the information,
the knowledge graph, if you might say, that they have created between
their different AI applications like Baidu AI for like search or
like Baidu AI for maps, et cetera.
So they are trying to come up with like a platform interface with
that one particular model, which kind of creates synergy across.
So I think A1 that is that.
I always believe that yes, sooner or later they are going to realize that.
The open source market would be some sort of a better way to take this forward.
Like how Sam Altman after a couple of years had to say it in a AMA
on Reddit that, Hey, I think maybe we're on the other side of history.
He's getting kind of like dragged into it.
Exactly.
I don't think he wanted to say it.
It's just that it just came out, right?
So I feel that as well, but, but looking at Baidu's, um, Um, like core
structure, they were always like very privacy, um, integrated systems or
something that they believed in building.
So I get where their mind is right now, but I think sooner or later
they are going to have to move.
And the pricing that they've kept, it's almost like half of
what EveSec or the others are.
So that is another, there, there another point that, see
guys, everyone's gonna use us.
We are like half as expensive.
So they have like an.
up in the market as is.
So they're like, maybe we're gonna leverage this as much as we can till we
can, and then we'll see when we get there.
Yeah, the competitive dynamics are really
interesting. Nathalie, I kind of
take a look at the situation and, you know, Veeam is a good reminder, right?
Like this is like, Baidu is like the, the kind of Google
of its, of its market, right?
The kind of search engine of its market.
And I guess I kind of look at that and I say, well, you know, the
kind of reputation has been that like Google has been kind of slow.
to capture the opportunity from AI.
And I say, oh, it's okay.
It's very interesting that in China also, like, the search engine
company is the one that's been kind of like slow to capture this.
Um, I don't know.
Should we read into anything in that?
Do you think that there's like something about search businesses or search
companies or, you know, dominant kind of, you know, these types of search
companies that are like maybe more limited in using or benefiting from AI?
Yeah, I think that's an interesting question because the way I see it
is that probably they don't need to open source in the sense that they
already have a big user, uh, base.
So, uh, people are already trusting them with so many things.
So potentially from the strategy perspective, open
sourcing, uh, would not be.
a key priority as it is for other companies.
Um, the other aspect that every time I think about open source models versus
having a more closed source model, from the security standpoint of view, when
you have an open source model, you are telling people, Hey, just go inspect it.
We try our best.
Tell us how, how would you think we did, and that it's offering a
lot of transparency, and I think it improves how we move forward.
Now, when other companies keep their models, uh, behind the scenes, and you're
just basically not telling how it works exactly, they may be, uh, planning,
for example, to orchestrate different types of components in the backend.
And, uh, I think, uh, we'll see just like, uh, as we see with OpenAI,
We are not fully sure how many models they have behind the scene.
We, we know we have guardrails and a lot of things.
So I think, um, not fully open sourcing in because they already
have such a big base for search, then they probably are thinking
it's, uh, it's okay to go that way.
But as you know, I'm a security person and I like transparency.
Uh, it makes it easier to test the system and so forth.
So, so yeah, that's, uh, my take on.
Open source versus non open source and what they are doing.
Yeah, for
sure. Uh, Kaoutar, are you on
team Vyoma?
Like, do you feel like they're, this closed source strategy is doomed?
You know, we're going to see Baidu have to open source in the future.
Uh, or do you think there's kind of like maybe different
things going on in that market?
Yeah, I, I think I, I kind of
agree with Nathalie, but I see
that they've already started making a step forward, uh, towards open source.
And they, I think they've announced that they're planning
to open source sometimes in June.
Uh, their, uh, their new models.
And I, this just shows that they are also competing with open AI and
deep seeks and especially seeing all the, um, you know, all the bus
that DeepSeek created.
Uh, so open sourcing AI
models like DeepSeek they've
gained traction.
And Baidu is likely sees this as a way to increase also adoption of its own models.
And so also a way to gain market share, attract developers, build an ecosystem
around its models, because if you keep these things closed, you're missing
in terms of this open ecosystem and developers and getting also the community
to help and especially the adoption.
I think the adoption is kind of goes hand in hand with the open source, especially.
Uh, so that is very important.
Uh, so driving this widespread adoption, uh, more developers, more use cases,
widespread adoption, faster improvements through also these external contributions.
Those are all win win strategies when you use open source.
And I think Baidu is getting it, and it's moving also towards that direction.
And this is just, you know, it just intensifies the competition
in China, but also globally.
So, so it's interesting to see these dynamics.
So the next thing I think I want to talk a little bit about is I like to always
have like a paper that we can discuss.
I'm a little bit kind of old fashioned in that sense.
We talk a lot about industry news, but I think it's just fun seeing what's going
on in the world of research and kind of interesting papers from week to week.
And this paper caught my eye.
So the title of the
paper is Chain-of-Thought Reasoning
in the Wild is Not Always Faithful.
Um, and so, for those of you who are kind of not
super aware.
Chain-of-Thought and kind
of reasoning models and exactly how this is looking right now.
Right now we have these
kind of like Chain-of-Thought reasoning
traces, um, where, you know, a model will kind of think through a problem,
uh, to greater or lesser degree before it kind of like renders an answer.
And, um, You know, overall, right, like we've sort of discovered this method is
really, really good in terms of getting the model to perform better and better.
But there's this kind of increasing sort of series of papers, this is not the
only one, that kind of are investigating the problem of what happens when the
model gives you erroneous reasoning for the decision that it's trying to make.
And when is basically sort of these reasoning traces not actually a faithful
way of understanding how models make
decisions.
Um, and Nathalie, you talk,
you think a lot about security, um, and I think this, this kind of paper
really raises a bunch of security issues in the sense of like maybe
we are giving people the wrong impression of how AIs actually think by
giving them Chain-of-Thought traces.
Is
that the right way of kind of thinking about this paper and I guess kind of the
problems of Chain-of-Thought in general?
Yeah, I think that that's a very interesting question.
Uh, I'd
rather have Chain-of-Thought so that I
can know at least a little bit how the model came to an answer.
What the paper shows is a lot of biases that may
happen in that Chain-of-Thought itself.
So,
And I think the reason it is really interesting is because the particular bias
that they are demonstrating in the paper is a bias that it's also in us humans.
So, for example, if I ask you, Tim, a question, is X larger than Y, depending
on the way I phrase the question, a lot of people would answer one way.
versus another.
So that's this cognitive bias that we know for sure and cognitive
psychologists have for so long a study.
Now where that that paper in particular shows that that
same bias exists in this model.
And I thought that was interesting.
And there's like the parallel in the cognitive psychology
for humans versus that paper.
And I think that's just why people are like, oh my gosh, this is so interesting.
Uh, if you study farther this type of situation, what you'll see
is that the models also exhibit a lot of other types of biases.
Now, uh, in particular for fairness, for example, there are some papers
like in 01, uh, there was a very interesting section that I read, uh,
about how the Chain-of-Thought itself may
be biased.
pretty hateful, for example, or may tell you stuff that you as a user don't want
to necessarily uh, see or exposed to, to certain, uh, uh, for certain use cases.
So, uh, overall, I thought it was really interesting paper.
Uh, the caveat, and because I am a researcher at heart, is that they only had
one data set and their temperature was 0.
7, uh, which I thought was, interesting.
So that goes
into a lot of detail about the paper, but I would like to see
like more, more expansion on this work because it's fascinating.
It's fascinating.
Yeah, absolutely.
Um, Vyoma did you agree?
I think, uh, I'm curious if you saw things that you were sort
of interested in the paper.
I mean, you know, from that, I think it's kind of interesting is
like on the temperature point, it's like, I don't know, how much should
we believe these results, right?
Like, it actually maybe turns out that, by and large, reasoning
traces are really useful as a way of kind of like understanding
how the model's making decisions.
And maybe we shouldn't be so scared?
I don't know.
What do you think about that?
Yeah, I do
agree with Nathalie on that point
that when it's at the temperatures point, you're actually telling it to be a little
bit more creative in its thinking as is.
And now you're using that as a base of saying that CoT is not here to stay.
It's gone.
I feel it is here to stay because it kind of tells you what it's going through.
And the other part is the people, like all these companies which have come
up with these models, the reasoning models, now they are looking into how to
make the Chain-of-Thought processes
better.
So I feel you can, like right now we see a lot of pattern matching
than having something which is like a more generalized way in which you
can understand the deep reasoning.
But going further.
What about a reverse CoT?
Like whatever a CoT has given you in the information, go
back and evaluate it again.
Tell me if that Chain-of-Thought was right
or not.
So there can be innovative ways in which researchers,
etc. will go on, uh, answering.
I feel it is your to say and it should.
So I'll just give you a short example.
I moved and I'm looking for a sofa in my, uh, apartment.
And what I was looking is, I
said, I want anand style
sofa with a table.
And then it just started giving me aand table.
But I knew that because I read it in that entire reasoning that, oh, now it's just
going and spinning on that table thing.
I don't want that.
Then I went and said, I want a side table.
And then again it went and told me that no, it's the table.
So, so, so I'm trying to tell you that I understood that I have to be so specific.
I want an adjustable low level.
site table and so that is, I wouldn't have done any of that had it just spun
and I would have been okay it's going to give me a japanese style sofa someday.
So I feel those are ways in which it tells you to improve your
prompt, tells you that this is not,
right now Chain-of-Thought is also based
a little bit on your prompt.
It doesn't tell you the exact model internal workings, but I feel it will.
Evolve with time.
It should evolve with time, and it will.
I mean, people are working on it, so.
Yeah, I think that's one of the most interesting things.
I never really thought about it that way, but
kind of the Chain-of-Thought's useful for
letting you know when the reasoning is definitely off, even though it may not
necessarily be a good guide for like when it got it right, how it got it right.
But it's like, it's kind of a debugging tool more than anything else, which I
think is like a really fun way of thinking
about it.
Um, Kaoutar I think I
would love to get you to comment on, uh, one
comment that Nathalie had, which
is it's very funny that these models have kind of inherited all of these
cognitive biases, uh, that humans have, um, which is very funny.
I mean, computers didn't used to have those types of biases, but
I guess we live in a world now where, you know, that's the case.
Um, yeah, just like, I don't know, someone who kind of like thinks about
sort of like hardware, which I always kind of envision as a much more kind
of like structured, you know, thing.
It feels like we've kind of like, they're, they're, these computers are now kind
of like, you know, executing systems that have like all of these weird kind
of like soft emotional aspects to them.
And it's just like, I don't know, I'm curious to hear your reflection on that.
It's like a very funny kind of contrast to what we thought about
computers doing 10 years ago.
Computers used to be exact, you know, zeros and ones, and we expect them to
be kind of the, the opposite of biased.
And, uh, but right now, because they're learning with AI, it's learning from the
data and this data is generated by, by us.
So it has inherited all of our biases and the way these models are learning.
I think it's only natural to see, you know, these.
these outcomes that we have to figure out systematic ways to solve in
them.
So Vyoma, I think,
mentioned some of them.
So, of course, you know, this isn't
valent as Chain-of-Thought uh, is unable
to generalize and it also accumulates all these errors.
The longer, the
longer the Chain-of-Thought of reasoning
is, which, you know, leads this faulty logic despite, you
know, these correct answers.
But, you
know, as Vyoma mentioned,
I think there are potential solutions, and I also agree that this is
something here to stay, especially the interpretability aspect of it is so
important that I think we need only to amplify the importance of this.
So we definitely need things like self correction modules.
Uh, Claude, for example, has this constitutional AI, which is a
reflection based approach that helps to self correct the model.
There's also things like structured step verification, these hybrid models where
neuro symbolic reasoning, like the tree of thoughts, for example, can also be
used to help correct, you know, the logic.
Uh, you know, also combining things like statistical logical
AI with probabilistic reasoning.
with logical constraints, all of these techniques, I think, need
to be brought into the table.
So to figure out how do we combine neural symbolic AI approaches to
improve the reasoning aspects of these LLMs and having also this self
verification in the reasoning and self correcting, which I think, you
know, will keep CoT useful, uh, and reduce the flawlessness in these tools.
So I think these hybrid reasoning frameworks will be necessary
to improve the reliability and the reasoning of these models.
Yeah, we'll definitely see that, I think, and it is kind of a
funny outcome that like we created
this like Chain-of-Thought thing, which
at times can be very emotional.
You read the Chain-of-Thought and like
I read one that was like, Oh.
You know, I'm trying to do the best I can at my job.
Okay, let's try to research this task.
And then you're kind of trying to like make it more computer, uh, again.
Um, it makes me think a little bit about like people will
say, Oh, he's like a computer.
And like, I think like maybe 20 years ago, that would mean that the person
is like very rigid and very logical.
It's almost kind of, I think a little bit about like, maybe, you
know, kids growing up today will be like, Oh, he's like a computer.
And by that they mean, you know, really irrational and emotional
and, you know, it'd be very funny if it kind of like flips what we
mean, uh, when we say, Oh, like.
This person is like a computer or they're thinking like a computer.
Imagine,
imagine that NotebookLM with CoT.
So let's say
you see the Chain-of-Thought of thought
and like in
your NotebookLM you can posit
that. No, don't go there.
Don't think like this, change this.
And then that can be used as a training data set.
I feel it's going to open new avenues for the prompt engineering aspect as well.
People will learn how to make prompt engineering more robust, scalable.
more precise with time.
And I think this also could help with the customizations.
The way you interact with the model might be very different from, you know,
how Tim interacted or how Nathalie interacts with, with the model.
So that, you know, I think localizations, customizations might also be interesting.
So you can also inject cultural cues and preferences.
So, Tim, I think it's going to even be more Biases that we're introducing to
this world while we're trying to make it
different.
It's going
to be, I think, both.
This hybrid world.
on our kind of segment on Baidu, which is again, like, kind of the narrative
that's been in the market or at least on Twitter, right, is basically that,
you know, Google's coming from behind.
They should have captured, you know, the AI revolution and they kind of
missed it and now they're catching up.
But it's kind of like week to week.
It feels like Google's like really catching up now.
Like, there's just all of these launches, which are like quite impressive.
Um, and, uh, uh, Google recently announced.
Um, and this is like almost kind of a joke in AI now.
Like, they launched a model called
Gemini 2.0 Flash Experimental,
um, which is basically a model that they had in beta for a small group of
people, but it is now widely available.
And it's an image gen model, um, that people can play with.
So this by itself maybe wouldn't be kind of super impressive, though the
model itself is pretty fun to play with.
Um, but I wanted to kind of use it as an opportunity to talk a little bit about
one particular aspect of the launch, um, which is that Google is touting that
one of the reasons why its model, you
know, its 2.0
Flash model
experimental is so good, is that it incorporates what they call
world knowledge, uh, to make the image generation better.
And, you know, like many phrases in AI, you're like, well, okay, world
knowledge, what does that even mean?
Um, and I guess maybe I'll start with you, like, What is world knowledge
anyways, uh, and why is it, why is it important, I think, to like AI
generation, uh, particularly in images?
Correct, um, so that's a good point that you said that I, that
first I want to answer this thing that is Google really catching up?
So that, that point that we made.
Yeah, the hot take.
I mean it's just vibes,
I don't have any industry stats, so feel free to knock me down there.
No, I get it,
I get it.
I've been asked this like many times now, like outside speaking
sessions as well, because catching up is like very subjective.
I get it.
My, my question here is, is the real question is, are these models going
to surpass or like at least match the creativity of these already
established model like Midjourney, Dall-e, etc, which are there, right?
So maybe they've arrived at the table, dinner table date, but maybe they got
like Really good big products, which none of these people already had.
And the other question that you're saying that what is this world knowledge?
So I feel the world knowledge in this is that they're talking about is deeply
integrated with their entire Google's knowledge graph that they have with access
to all the real world data that we have.
So instead of just learning from the image that we have, image pictures of etc.
They are also learning from the text, the structured text as well.
Imagine all the Google searches that we've done, all the pictures that I've
posted about my sofa that, you know, this is not what I want, this is what I want.
So that has been kind of added into that historically consistent world
knowledge that Google already has.
And I think it's extremely important to have any to kind of create a model
which is much more accurate in answering the questions that users might have.
Yeah, and I think, I don't know, I see this as almost like Google
using or trying to use its like advantages in the space, right?
It says lots of people can train image generation models, but we've
got, we've got the knowledge, right?
And so like we have to put that to use.
I would trust them.
Like if, even though I've been using many of these other models available, I'm going
to use them now because I know for a fact that they might have much more domain
specific accurate data that I might need.
Yeah, for
sure.
Um, Kaoutar, am I just
operating on vibes?
Is, uh, is Google catching up?
Or is this just kind of like, ah, table stakes?
They're just like able to generate a model which is as good pretty
much as everybody else now.
I think it's table stakes.
So I don't think it's a catch up game series, they've been working on
it, so it's just the time for them.
It's ready for the release.
Yeah,
right.
Um, Nathalie, you were
laughing, I guess maybe you agree, or?
I, I think at some point, uh, last year I was so surprised and so excited
every time I saw an announcement.
Now it's like every week something is happening.
And yeah, so I, I think that the space has started to be like people
are catching up and it's kind of becoming a commodity sort of situation.
Uh, that said, I still get impressed by the fact that right now we can say
like change my tulips for flowers.
that are wild and be very focused on a part of an image.
I think that was not the case a few months back, and that still makes me happy.
So from that perspective, I love seeing more and more models coming out.
I think not only Google, but many other players are going to continue improving
the models and the capabilities and the ways we describe and get to go
get some beautiful pictures out of it.
Um, so yeah.
Yeah, for sure.
Yeah, I feel like it's, uh, it's really hard to be in the AI business
because you're like, you're, you're doing like magical things that have
never been done with computers before.
And then people like six months later are like, ah, what else
do you, what else you got?
You know, it's like, it's very, very hard, I think, to like keep ahead.
Cause I, I agree with you.
I mean, there's almost like announcement fatigue, you
know, it's just like, what is.
What is the next big thing?
Well, I don't know.
It just feels like there's big announcements, like every week.
And so all of it kind of like blends in together.
Yeah, I think the key question here, are they catching up or
are they really innovating?
So, uh, I think that's what we need to focus on.
So of course you can catch up.
You can see what others are doing and try to close those gaps, mimic,
you know, or try because a lot of the stuff, the algorithms, you know,
a lot of them is published, but are you really bringing something
new here to the table that nobody?
else has thought about.
So, I think maybe we should start seeing those more, uh, trends.
Who's the real innovator here who's just playing the catch up game?
Yeah,
definitely.
Kaoutar, I feel
like you're, you're a harsh judge.
Well, that's all the time that we have for today.
Thanks for joining us.
Uh,
Nathalie, Kaoutar, Vyoma, it's
always a pleasure to have you on the show.
And thanks to all the listeners out there.
We're going to try something new this week.
We're always interested in kind of hearing a little bit more about what
you out there are interested in, uh, hearing about from week to week.
So Spotify, please drop a comment.
Let us know.
We're going to be keeping an eye out on that.
And we'll probably work that into future episodes.
So flag anything you've seen that you want us to talk about And we're
looking forward to hearing from you.
Um, and as always, if you enjoyed what you heard, you can get us on Apple Podcasts,
Spotify, and podcast platforms everywhere.
And we will see you all next week on Mixture of Experts.