Nvidia Licenses Grock in Aquahire
Key Points
- Grock with a Q announced a non‑exclusive licensing deal with Nvidia for its inference‑on‑chip technology while keeping the company independent under new CEO Simon Edwards.
- As part of the agreement, Grock’s founder Jonathan Ross, president Sunonny Madra, and several key engineers will “aqua‑hire” to Nvidia, effectively transferring the team’s expertise without a formal change‑of‑control.
- This hybrid structure blends a technology licence with a talent acquisition, resembling a “brain transplant” rather than a traditional outright acquisition.
- Analysts note the deal reflects a growing industry trend where large AI firms use licensing and aqua‑hire arrangements to absorb startup capabilities while preserving the startup’s corporate shell.
- The new model reshapes the concept of an “exit” for startups and employees, as equity triggers and conventional acquisition payouts may no longer apply.
Sections
- Nvidia’s Unconventional Grock Deal - Nvidia secures Grock with Q’s inference‑on‑chip technology via a non‑exclusive licensing pact and talent hire, a complex arrangement—not a straight acquisition—that could reshape the AI hardware landscape heading into 2026.
- Memory Bandwidth Limits AI Speed - The speaker argues that beyond compute power and scarce talent, the primary bottleneck for advancing large language models is memory bandwidth—the ability of chips to rapidly move data for matrix multiplications, weight accesses, and KV‑cache handling, which directly dictates inference performance.
- HBM and CoWoS Packaging Explained - The speaker explains why high‑bandwidth memory and TSMC’s chip‑on‑wafer‑on‑substrate (CoWoS) packaging are essential to overcome the memory wall and deliver fast, efficient AI accelerator performance.
- HBM Bottleneck and SRAM Overview - The speaker outlines how the limited supply of high‑bandwidth memory (HBM) constrains AI hardware development, noting the few manufacturers and recent executive fallout, then shifts to describe SRAM’s faster, on‑chip characteristics despite its lower density and higher cost.
- On‑Chip SRAMM vs HBM - The speaker explains that using on‑chip SRAM as the primary weight store can cut inference latency by keeping the working set on the die, yet its limited capacity (hundreds of megabytes) means it cannot replace the much larger gigabyte‑scale HBM needed for modern AI models.
- Inference Economics and SPV Financing - The speaker explains that AI inference incurs continuous operating expenses while training requires heavy upfront capital, notes Nvidia’s push for low‑cost inference with Grok, and outlines XAI’s financing plan using a special‑purpose vehicle that combines equity, debt, and Nvidia’s potential investment to lease GPUs.
- Big Tech License‑Hire Strategy - The speaker explains that firms like Google, Microsoft, and Amazon are increasingly acquiring startup technology, talent, and rights through licensing and “acqui‑hire” deals rather than full purchases, reshaping investor returns and employee incentives.
- Nvidia's Defensive Chip Strategy - The speaker explains that Nvidia’s acquisition of Grok talent is a strategic defensive move to safeguard its specialized AI chip leadership—distinct from fears about Google’s TPU dominance—by retaining expertise for LPU applications while navigating complex financing and market dynamics.
Full Transcript
# Nvidia Licenses Grock in Aquahire **Source:** [https://www.youtube.com/watch?v=BRXGDCBSARY](https://www.youtube.com/watch?v=BRXGDCBSARY) **Duration:** 00:25:39 ## Summary - Grock with a Q announced a non‑exclusive licensing deal with Nvidia for its inference‑on‑chip technology while keeping the company independent under new CEO Simon Edwards. - As part of the agreement, Grock’s founder Jonathan Ross, president Sunonny Madra, and several key engineers will “aqua‑hire” to Nvidia, effectively transferring the team’s expertise without a formal change‑of‑control. - This hybrid structure blends a technology licence with a talent acquisition, resembling a “brain transplant” rather than a traditional outright acquisition. - Analysts note the deal reflects a growing industry trend where large AI firms use licensing and aqua‑hire arrangements to absorb startup capabilities while preserving the startup’s corporate shell. - The new model reshapes the concept of an “exit” for startups and employees, as equity triggers and conventional acquisition payouts may no longer apply. ## Sections - [00:00:00](https://www.youtube.com/watch?v=BRXGDCBSARY&t=0s) **Nvidia’s Unconventional Grock Deal** - Nvidia secures Grock with Q’s inference‑on‑chip technology via a non‑exclusive licensing pact and talent hire, a complex arrangement—not a straight acquisition—that could reshape the AI hardware landscape heading into 2026. - [00:04:15](https://www.youtube.com/watch?v=BRXGDCBSARY&t=255s) **Memory Bandwidth Limits AI Speed** - The speaker argues that beyond compute power and scarce talent, the primary bottleneck for advancing large language models is memory bandwidth—the ability of chips to rapidly move data for matrix multiplications, weight accesses, and KV‑cache handling, which directly dictates inference performance. - [00:07:43](https://www.youtube.com/watch?v=BRXGDCBSARY&t=463s) **HBM and CoWoS Packaging Explained** - The speaker explains why high‑bandwidth memory and TSMC’s chip‑on‑wafer‑on‑substrate (CoWoS) packaging are essential to overcome the memory wall and deliver fast, efficient AI accelerator performance. - [00:10:53](https://www.youtube.com/watch?v=BRXGDCBSARY&t=653s) **HBM Bottleneck and SRAM Overview** - The speaker outlines how the limited supply of high‑bandwidth memory (HBM) constrains AI hardware development, noting the few manufacturers and recent executive fallout, then shifts to describe SRAM’s faster, on‑chip characteristics despite its lower density and higher cost. - [00:14:09](https://www.youtube.com/watch?v=BRXGDCBSARY&t=849s) **On‑Chip SRAMM vs HBM** - The speaker explains that using on‑chip SRAM as the primary weight store can cut inference latency by keeping the working set on the die, yet its limited capacity (hundreds of megabytes) means it cannot replace the much larger gigabyte‑scale HBM needed for modern AI models. - [00:17:18](https://www.youtube.com/watch?v=BRXGDCBSARY&t=1038s) **Inference Economics and SPV Financing** - The speaker explains that AI inference incurs continuous operating expenses while training requires heavy upfront capital, notes Nvidia’s push for low‑cost inference with Grok, and outlines XAI’s financing plan using a special‑purpose vehicle that combines equity, debt, and Nvidia’s potential investment to lease GPUs. - [00:20:27](https://www.youtube.com/watch?v=BRXGDCBSARY&t=1227s) **Big Tech License‑Hire Strategy** - The speaker explains that firms like Google, Microsoft, and Amazon are increasingly acquiring startup technology, talent, and rights through licensing and “acqui‑hire” deals rather than full purchases, reshaping investor returns and employee incentives. - [00:23:34](https://www.youtube.com/watch?v=BRXGDCBSARY&t=1414s) **Nvidia's Defensive Chip Strategy** - The speaker explains that Nvidia’s acquisition of Grok talent is a strategic defensive move to safeguard its specialized AI chip leadership—distinct from fears about Google’s TPU dominance—by retaining expertise for LPU applications while navigating complex financing and market dynamics. ## Full Transcript
There's only one news story that
mattered this week and it was the story
of Grock with a Q. Not Grock with a K,
not the AI model company. Grock with a
Q, the inference on a chip memory
company. Grock with a Q was quote
unquote bought by Nvidia. And I use
scare quotes for that because the story
is much more complicated. This is one of
the defining plays of 2026. It's
happening right at the end of 2025. I
know a lot of us are focused on the
holidays and time away. I want to make
sure that we don't miss this story
because it's going to shape the world
that we all live in AI wise for the next
few months. First, what did Grock
actually announce? Number one, they
announced a non-exclusive licensing
agreement with Nvidia for Gro's
inference technology. And in the same
announcement, they said that Jonathan
Ross, their founder, and Sunonny Madra,
their president, plus some other team
members are moving to Nvidia as part of
the deal. That's the aqua hire part of
the deal. Grock also said it remains
independent and named Simon Edwards as
CEO and said that Grock Cloud, one of
their products, is continuing. This is
not a straight acquisition. It's
something else. It's a transfer of
capability. It's like a brain transplant
and it doesn't have a clean change of
control event. So, let's slow down.
Let's define what's actually going on
because the mechanics of this deal are
the point. A license obviously means one
company pays for the right to use
another company's tech. Non-exclusive
means the seller can license the same
tech to other parties. It's not on paper
a takeover, right? And then there's then
there's the other piece of it that makes
it feel like a takeover, the aqua hire.
And aqua hire is when the real asset
being acquired is the team, right? Key
key leaders, key engineers, and they
matter somehow more than the company's
revenue, more than the company's
product. Historically, that kind of buy
has only happened via full acquisition.
So, this is what would happen when say
Metabot WhatsApp, right? It was a full
acquisition deal. What's new here is
that hire the team is becoming a part of
the play that frontier AI companies run
when they want to snap up smaller
startups in the AI space. They they hire
the team, they license the tech, while
someone else's job is to keep the
startup's corporate shell alive. One of
the things Reutder called out that's
correct is that this is part of a
broader trend where big tech is using
licensing and hiring structures like
this instead of straightforward
acquisition. And that matters because it
changes the meaning of the word exit for
startups and for employees. Before when
we got the startup story, it was really
simple, right? If you have an exit
event, whether you go to the public
markets, whether you get acquired by a
private company, the company changes
hands. There's a change of control event
and all of the equity triggers
associated with that occur and that
means employees will get some kind of
reward for the time they spend in the
company if they had equity at the time.
But now all of that is different. It is
unclear what the remaining employees at
Grock get, if anything. And this is not
the only time this has happened. It's
happened a couple of other times. It's
becoming a way that larger companies are
able to grab key people and pull them
over into their corporate entity without
triggering regulatory review which is
handy and it's I understand the strategy
but it leaves things really awkward from
a exit and culture perspective in
Silicon Valley and it also tells us
something nonobvious about where the
race is headed. We have known for a
while that a few valuable people are
worth more than entire companies. That's
what the market told us about Mer Morati
coming out of OpenAI, founding her own
business, getting a monster seed round
off of her name alone. It's also what
the market said about Ilia Sudsker,
right? Founding safe super intelligence,
getting a monster round. Very similarly,
there are people who are worth more than
any corporate shell can contain. And one
of the things that the frontier AI
companies are figuring out is that they
would rather have the people on board
than the tech or the assets that come
with the company, the cap table,
anything. They just want the people. And
this is a really clean way to get that.
Ironically, when we say we're
computebound, I sometimes think that
we're people bound, that we have a few
people who can drive AI forward and they
are worth anything that they care to say
they're worth. And that is really the
barrier at this point to moving forward.
It's just it's just an interesting
thought. The other thing that we're
bound by though besides compute is
memory. Our deeper constraint, and I've
been emphasizing this for a while, is
about memory bandwidth. And that's an
important part of the story here because
Grock was working on memory. And memory
is really about how fast the chip can
move data in and out of working memory
while it's doing the matrix
multiplications that are at the heart of
large language model inference. So
modern AI models don't just do
mathematics. Instead, they constantly
fetch and move enormous amounts of data,
right? They do the matrix
multiplication, but they also play with
model weights. They play with activation
of parameters. And in generation
workloads, the KV cache that stores
context so the model can keep going is
critical and needs to be filled and and
read and all of that all the time. So
the result is that fast AI is as much
about feeding the chip as it is about
the chip's raw compute. This can feel
really theoretical, but we see it
already in our local machines. As an
example, if you upgrade from an M2 Apple
silicon chip to an M5 Apple silicon chip
on their new laptops, you will feel the
speed up in all of your cloud LLMs. Like
you'll be talking to Claude or Chad GPT
or any other AI Gemini, and it will feel
faster because the tokenization to feed
the chip happens on the local machine.
And I didn't know this either, but like
it happens on the local machine. And so
you need a local chip that handles
tokenization efficiently. And so in a
sense, our perception of speed is
governed by this whole ecosystem of
memory management that happens around
it. And that's what Grock was all about.
And the thing that shows us this is true
is that the components that make the
memory system work are now being
pre-sold years and years ahead. KH
Highix, a Korean company, has repeatedly
said that its high bandwidth memory is
effectively allocated out over multiple
years with Reuters reporting sold out
conditions for all the way through 2025
uh and later reporting that HBM was sold
out in 2026. Volumes were being
finalized now. Wild how competitive it
is. We are at a point where Google execs
are getting fired because they were
unable to come up with pre-allocated
high bandwidth memory to support
Google's TPU goals heading into next
year. That is how important memory is.
So what is high bandwidth memory? It is
not a different kind of memory the way
people imagine. It's DRAM, right? It's
dynamic random access memory. It's
stacked vert vertically and packaged
right next to the processor physically
connected with very very wide interfaces
so that the chip can pull data far far
faster than it could from conventional
memory solutions. SKH highex itself
defines HPM as a memory that vertically
interconnects multiple DAM chips to
dramatically increase processing speed
compared to earlier DRM. Essentially,
you're stacking these DRAMs on a very
high bandwidth connection with the core
processing chip itself so that you
reduce memory read write bottlenecks at
the time of inference. And why does HBM
keep showing up in Nvidia conversations?
Because it's one of the things that
makes modern AI accelerators practical.
A GPU can do a staggering number of
operations per second, but it cannot
pull the model weights and the working
set quickly enough. And if it can't do
that, it stalls, right? Like it must
have the ability to reference and pull
from memory very rapidly for things like
model weights or it's going to stall
out. And that's what people mean by a
memory wall physically on the chip. The
firm semi analysis describes HBM as a as
combining stack DRAM with ultraride data
paths and notes that essentially all
leading AI accelerators deployed for
generative AI training and inference
must use HBM. It is it is a requirement
to have high bandwidth memory so that
you can access that memory very very
quickly in to deliver the kind of high
quality inference that we want out of
our AI models. But HBM has a second
constraint that non-h hardware people
usually miss. It's not just about
manufacturing the memory dies. It's also
about the packaging. So if you've heard
the term coas, that's what this is
about. I'm going to define it. If you
haven't, don't worry. COS means chip on
wafer on substrate. CO wos. This is
TSMC's advanced packaging technology
that lets you place logic dies and HBM
stacks together on one large silicone
interposer with dense interconnects in
between. Again, that that dense
bandwidth matters, right? TSMC
explicitly describes co-as as a
packaging that accommodates logic
chiplets alongside HBM cubes that are
stacked over a silicon interposer for AI
and HBC workloads. If that sounds like a
lot, think of it as we know we need to
colllocate these tools so that we can
get a hold of them fast. Imagine
colllocating a giant apartment building
and a giant downtown office building in
the same block. Now people can move back
and forth between home and work
efficiently. It's a very similar
concept. We're just operating at, you
know, billionth of a meter scale here.
The Financial Times has described
advanced packaging as increasingly
central as miniaturization is slowing
and points out that techniques like HBM
stacking and co-op style integration
have become essentially required to get
the kind of generative AI performance
we're looking for. This is in line with
the ongoing thesis and this is something
Ethan Mik first called out. I really
like it. He did his thesis on Moore's
law which is something we're sort of in
some senses past and some in some sense
is still living through. Uh and what he
pointed out is that Moore's law was not
a single law, right? It actually is a
reflection of a trend line captured by
the allocation of capital and people on
a singular problem over a very long
period of time. That's exactly what
we're seeing here with GPU technologies.
COS and other technologies are basically
ways that we are addressing the ongoing
challenge of increasing AI performance
even as we start to hit physical
miniaturaturization limits. So now you
have the chip ecosystem context. AI
demand doesn't just pull on GPU supply.
Therefore, it inherently tugs at HBM
supply. That's why those Google execs
got fired this week. It pulls on
packaging capacity. It pulls on the
ability of a few specialized
manufacturers to ramp very quickly.
Again, I want to remind you almost
everywhere you look in the AI stack, one
company is sitting there supplying a
crucial component. It's it's amazing
this whole ecosystem works together. And
crucially, HBM is one of those
bottlenecks. The major makers are SKH,
Samsung, and Micron. That's it. That's
all you got. Now, now we come back to
the Gro story. Let me introduce you to
SRAMM because SRAMM is where the Grock
discourse course gets really interesting
and really really weird. SRAMM is a
different kind of memory. It's called
static random access memory is a kind of
memory used typically for caches and
onchip storage. So the static part means
it doesn't need constant refreshing the
way DRAM does. So that means it can be
faster to access even the DRAM. Like you
know how we spend a lot of time talking
about sort of DRAM as this cube stack
that fits on the ship with this really
wide highway etc. This RAM is faster
because
it literally exists on the chip. It's
like imagine you have a live work
solution right in the same building.
Right? That's a terrible analogy but you
get the idea. It literally exists on the
chip so it can be faster. It's also much
less dense and it's more expensive per
bit. So the definition that we typically
have is that SRAMM is faster than DRAM
but more expensive in silicon area and
cost because it's a little bit less
dense. So SRAMM is typically used for
cache and internal registers where you
need the speed but you're not going to
have as much memory while DRAM is used
for main memory. It's not like there's a
perfect solution. We are swapping back
and forth between the two. That's how
most chip architectures work. Here's the
key thing that people misunderstand
though. you you don't order SRAMM from a
supplier the way you order those HBM
stacks that I described. So you're not
going to SK highix and saying I want
some SRAM. SRAMM is generally built into
your chip design because it literally is
in the chip. So more SRAMM usually means
more die physical space, right? That the
die for the chip gets bigger which
usually means higher cost, more yield
complexity and SRAM scaling has been
increasingly difficult in advanced chip
design. So semiconductor engineering has
been really explicit and said that
SRAMM's inability to scale has has
challenged our ability to hit power and
performance goals because we need better
SRAMM as chips continue to get better
and SRAM remains the workhorse memory
for AI. There's not sort of been a a
solution there. This brings us to the
reason Nvidia bought Grock. This is
Grock's real technical wedge. Gro makes
inference focused accelerators often
described as LPUs or language processing
units. Grock's own product brief for the
Grock chip lists 230 megabytes of SRAMM
per chip and claims up to 80 terabytes a
second on die memory bandwidth. Grock's
public blog framing makes the contrast
really explicit, right? onchip SRAMM
bandwidth upwards of 80 terabytes a
second versus offchip high bandwidth
memory they describe as roughly an order
of magnitude lower in their comparison.
So like on the order of 8 terabytes a
second. And in a later post, Grock also
described its approach as integrating
hundreds of megs of onchip SRAMM as a
primary weight storage for the model,
not merely as a cache. And this is why
chip people talk about gro as low
latency. If the working set you need can
live on the die on the chip, you can
avoid an entire class of stalls and
variability that come with model
performance when models have to go off
the chip. That can matter enormously for
workloads where you feel that latency.
Voice systems, interactive co-pilots,
real-time agents, any workflow where a
slow response breaks the user
experience. But now here comes the
nuance. SRAMM speed is real, but SRAMM
capacity remains a constraint. The same
GR Grock product brief that makes SRAM
sound really magical forces you to face
scaling math. 230 megs, that was a lot
of memory back in the 1990s. I I
remember when we had 230 megs. It's not
a lot of memory in modern AI terms. A
single HBM stack is measured in tens of
gigabytes. Micron, for example,
described its HPM3e as 24 gigs in an
eight high stack and 36 gigs in a 12
high stack. Samsung described its HP M3E
12 highi is 36 gigs as well. So, we're
getting to the point where we have tens
of gigs available through wide
interconnects on the chip. That's orders
of magnitude more than 256 meg. So,
SRAMM cannot and does not replace HBM.
You can't get away from that. What SRAMM
can do is win narrow slices of inference
where the advantage of on die processing
dominates and the workload can be shaped
to fit that memory constraint. And
that's why you see people describe SRAMM
heavy designs as compelling for very
deterministic inference but very weak at
scale. They're playing with the physics
and basically saying for simpler jobs
where we need more deterministic
performance for from our LLMs, Grock's
solution can be useful. And it's not
just the capacity issue, right? It's
also the scaling issue. The industry has
been wrestling with SRAMM bit cell
scaling for a long time. Tom's Hardware
is another newspaper that reported on
this and and has reported that SRAMM
density improvements have been hard at
certain node transitions and in
particular calls out that TSMC claimed
meaningful SRAMM bit cell shrink at its
2nde
after limited gains at 3 nanometer. In
other words, the underlying point is
that SRAMM is increasingly a first class
chip design problem and even first class
chip design firms like TSMC are
struggling with continuing to shrink
every generation to fit more on the
chip. So now we can finally answer why
would Nvidia do this? The most
straightforward read is that Nvidia is
treating inference as strategic. So
let's define inference really plain.
Training is when you take a model and
you run enormous amounts of data through
it to update its weights. Right? So this
is the very expensive the one-time
process of creating the model done.
Inference is when you use the trained
model to generate outputs. So every chat
response you get from chat GPT every
image response you get from nano banana
every search rank every agent step.
Training is episodic. It happens when
you make a new model. Inference is
continuous. So training is very capex
heavy. You pay a lot and you just plonk
it down and you train your model.
Inference becomes operating expenses. If
AI becomes embedded in products, most of
the tokens on the planet will be served
in inference, not burned in training.
And that's why Grock's announcement is
phrased the way it is. They're looking
to expand access to quote high
performance, lowcost inference. And
that's why Reuters frames Nvidia's move
as part of a competitive push as the
market shifts toward inference versus
training with Grock positioned as an
inference specialist. Now, let's bring
the financing story in because it's the
same reality. It's just expressed
through the capital markets instead of
chip architecture. So, in October,
Reuters reported reported that Elon
Musk's XAI was nearing a $20 billion
financing package tied to buying Nvidia
processors for the Colossus 2 data
center. The key detail is in the
structure. An SPV or a special purpose
vehicle that would raise a mix of equity
and debt would buy GPUs and then
effectively lease or rent that compute
back to XAI. Reuters also relayed
Bloomberg's reporting that Nvidia might
invest up to $2 billion in the equity
portion. So an SPV is basically a legal
and financial wrapper built for one
specific job. In this case, the job is
turning GPUs into a financable asset
class, a pool of hardware that can back
debt, generate contracted cash flows,
and be scaled without requiring the
operating company to fund everything
directly in a traditional manner. You
can think of it as project financing for
compute. In a world where GPUs are
scarce, HBM is constrained, and power
and data center capacity become really
binding constraints, the ability to
structure financing that locks in supply
actually becomes part of the entire
competitive AI game. Not because you
want to create fancy vehicles for no
purpose, but because it's how you
guarantee you can run systems over time.
And so this is where it all comes
together. Gro NVIDIA is about pulling a
specific inference capability, low
latency, SRAMMheavy, deterministic
serving of models closer into Nvidia's
platform without forcing Nvidia to
purchase the whole company. Just as
XAI's SPV story is about locking in the
physical substrate of scaling the GPUs
by turning them into a finance supply
chain, both of these stories are about
the same thing. who controls the path
from model capability to product
capability at scale. Nvidia needs to
play in that game and so Nvidia needed
to buy Grock. And the Grock structure is
really not unique anymore. It's part of
a larger pattern as I called out of
license and aqua hire deals that have
become really common in Frontier AI.
Reuters reported that Google did a $2.4
billion licensing deal for Windsorf. I
talked about it a few months ago. I
flagged it as an issue then. Google
hired key leaders and researchers and
left windsurf independent and paying its
investors via the license fee. This was
also something we saw with Microsoft
where Microsoft agreed to pay inflection
about $650 million in a licensing deal
while hiring key staff and that was not
actually an acquisition, right? Amazon
did the same thing with Adept. Amazon
did it again with coariant on their
robotics. Google did it with Character.
Once you see these together, the whole
story is not big tech is buying
startups. The story is big tech is
increasingly buying capabilities, people
and rights without buying the companies
outright. And that suggests to me again
that what we value on the cap table is
starting to change. And that changes
employee outcomes and incentives in ways
that are easy to miss unless you've
lived through a few acquisitions. In a
traditional acquisition, there's
obviously a change of control and that
triggers the contractual mechanisms like
option plans that may have acceleration
clauses, etc. And what it all adds up to
is proceeds going to investors to
preferred stock to common stock and in
some cases to the employees if they have
exercised their options etc. In a
license and aqua hireer deal like what
happened with Grock none of that
occurred and to be fair you can see
hints of how some of these deals are
sometimes structured to address that
concern. So the Wall Street Journal
reported that Character.ai's AI's Google
licensing fee was around $2.7 billion
and some of that was used to buy out
early investors, suggesting that there
was an explicit attempt to create at
least some liquidity without an
acquisition event. But the larger point
remains that these tend to be bespoke
arrangements and they don't guarantee
that everybody wins together, which is
what many people have implicitly believe
the Silicon Valley story to be about. If
you sign up as one of the first 10 or
the first 50 in a company, you think
you're going to win with the founders.
Not as much as the founder, but a little
bit. So, if you're trying to take one
deep lesson from this week, don't think
of it as Nvidia is scared. I've seen
that. And don't think of it as SRAM is
the future. I've seen that, too. It's
really that the AI race is forcing a
vertical integration of realities that
used to be separate. Hardware is not
just hardware anymore. It's memory. It's
packaging. Inference is not just a
detail. Inference is becoming the whole
game. Financing is not just fundraising
anymore. It's a way to lock in supply.
And acquisitions are not just
acquisitions anymore. They're
increasingly structured as a capability
transfer to deliver the people and the
license fees needed to secure a
strategic advantage in the AI race.
Nvidia needs to be in the inference
game. Nvidia needs to have products that
are strong on fast inference to continue
to evolve and maintain their lead.
Nvidia does not want the designers of
the TPU chip, which by the way, that is
exactly who they got in the Grock deal.
The Grock deal included the founder of
Grock, who designed Google's TPU chip.
They don't want that person loose on the
market. They'd rather bring them in as
insurance. And people do paint this as
Nvidia is worried about Google. Nvidia
is worried about the TPU domination. I
don't think that's the correct
interpretation because Google's
advantage is predicated on Google's TPU
chip remaining mostly inside the house.
Google does license their TPU chip,
don't get me wrong, but they would
rather you didn't buy it and they're
okay with it being priced in such a way
that it remains a nice to have for a lot
of companies, not a must-have. And part
of why is that if Google's TPU chips
become commoditized, Google loses the
competitive advantage they have with
TPUs. Nvidia is in a different game.
Nvidia is not in a hyperscaler
modelmaker game. It's in the chip
business. Nvidia needs to have the
talent on side to make sure that they
can tackle these specialized LPU
applications without jeopardizing the
core business. They need to make sure
that they can bring in the technology
and knowhow from Grock and use that as
part of their continuing wedge that make
them the only game in town at scale for
model makers. And so this was a little
bit of a defensive play by Jensen, but
it's absolutely a play that makes sense
when you think about who was involved
and why it was worth getting that those
people out of Grock. I wanted to take
time to go through all of the details
because one, I don't think the chip
story is well understood. I don't think
the nuances of the financing are well
understood and why these multi-year
deals are well understood. And I also
don't think the talent story is well
understood. So I hope this has given you
a picture into how business is actually
getting done at the cutting edge of AI
and how we are able to continue to
advance on the physical constraints that
drive the model experiences we all use
and love every day. Yeah, I guess thanks
for coming to Professor Nate's class.
This is a bit of a long one, but I hope
you enjoyed