Halloween AI Roundup: TPUs, Insurance, Space
Key Points
- Anthropic announced a massive expansion with Google Cloud, planning to deploy up to 1 million TPUs and add over a gigawatt of compute capacity by 2026, an investment worth tens of billions of dollars.
- Recent AI industry headlines include OpenAI’s shift to a traditional for‑profit model granting Microsoft a $135 billion stake, Nvidia hitting a $5 trillion market valuation, and Amazon unveiling AI‑powered smart glasses for delivery drivers.
- The Halloween‑themed “Mixture of Experts” episode brings together Gabe Goodheart, Chris Hay, and Kate Soule to discuss AI insurance, OpenAI’s blog on handling sensitive conversations, and the concept of building data centers in space.
- A quirky AI application highlighted in the show is a Toronto‑based digital agency’s playlist that combines AI and canine sound science to keep dogs calm during noisy Halloween festivities.
- Host Tim Hoang humorously notes that while the promise of virtually unlimited energy and cooling for space data centers is enticing, the practical maintenance challenges could be nightmarish.
Sections
- Halloween AI News Panel - In the Halloween episode of *Mixture of Experts*, host Tim Hoang and a panel of AI experts preview topics ranging from Anthropic’s TPU commitment and AI insurance to OpenAI’s blog on sensitive conversations and space‑based data centers, while also recapping recent headlines like OpenAI’s for‑profit restructure and Nvidia’s $5 trillion market cap.
- Chip Platform Diversity and Inference Shift - The speakers explain that while transitioning between hardware platforms has become easier, the move toward inference‑heavy workloads drives companies like Anthropic to explore multiple chips for scaling, yet they likely stick with Nvidia for training due to entrenched CUDA optimizations.
- GPU Tensor Ops Landscape - The speaker explains how NVIDIA’s CUDA‑first innovations dominate cutting‑edge tensor operations, while other APIs like Vulkan and OpenCL are catching up, leading to hardware/software parity that lets most models run efficiently on any platform.
- Hardware Strategies: OpenAI vs Anthropic - The speakers compare the use of specialized AI chips for custom models, noting OpenAI’s push toward its own hardware for long‑term AGI ambitions, while Anthropic remains focused on practical enterprise deployment without pursuing a dedicated chip.
- Trust Through Certifications And Insurance - The speaker argues that reducing technical complexity into clear certifications and insurance guarantees is essential for consumers to confidently adopt AI and other advanced systems, driving market adoption through monetary incentives.
- Insurance Regulation vs Chinese Competition - The speaker questions whether stringent US insurance standards will disadvantage domestic firms against faster, less‑regulated Chinese providers and asks a bullish colleague if they are more optimistic about near‑term capacity growth.
- Why General AI Insurance Fails - The speaker argues that blanket insurance policies for AI models are naive, stressing that risk management must be use‑case specific and that proposals for superintelligence‑level insurance are unrealistic.
- Rare Event Safety Dilemma - The speaker highlights how the scarcity of real‑world mental‑health emergency cases makes traditional ML safety training impractical, prompting reliance on expert panels and simulated evaluations, and questions whether this will become the industry norm amid calls for fewer guardrails.
- Assessing Trust in OpenAI Messaging - The speakers critique OpenAI’s vague public statements, highlighting the difficulty of gauging real technical progress versus superficial “CYA” messaging and the resulting challenges in establishing user trust.
- Ethical Concerns of AI Emotional Support - A speaker reflects on users' emotional reliance on AI, acknowledging its utility when human help isn’t feasible while expressing skepticism about whether AI truly guarantees people receive appropriate professional assistance.
- AI Companionship Safety & Space Data Centers - The speaker critiques reactive safety “bumper‑rail” measures for AI companions, preferring built‑in safe design, then shifts to present Nvidia‑backed StarCloud’s bold claim that future data centers will be placed in orbit to exploit vacuum cooling and solar power, inviting reaction.
- Space Hardware, Politics, and Dreams - The speakers weigh the technical hurdles of upgrading orbital equipment against looming debris, regulatory, and geopolitical challenges, while expressing enthusiasm for the futuristic concept.
Full Transcript
# Halloween AI Roundup: TPUs, Insurance, Space **Source:** [https://www.youtube.com/watch?v=KF3jIUBecFo](https://www.youtube.com/watch?v=KF3jIUBecFo) **Duration:** 00:47:36 ## Summary - Anthropic announced a massive expansion with Google Cloud, planning to deploy up to 1 million TPUs and add over a gigawatt of compute capacity by 2026, an investment worth tens of billions of dollars. - Recent AI industry headlines include OpenAI’s shift to a traditional for‑profit model granting Microsoft a $135 billion stake, Nvidia hitting a $5 trillion market valuation, and Amazon unveiling AI‑powered smart glasses for delivery drivers. - The Halloween‑themed “Mixture of Experts” episode brings together Gabe Goodheart, Chris Hay, and Kate Soule to discuss AI insurance, OpenAI’s blog on handling sensitive conversations, and the concept of building data centers in space. - A quirky AI application highlighted in the show is a Toronto‑based digital agency’s playlist that combines AI and canine sound science to keep dogs calm during noisy Halloween festivities. - Host Tim Hoang humorously notes that while the promise of virtually unlimited energy and cooling for space data centers is enticing, the practical maintenance challenges could be nightmarish. ## Sections - [00:00:00](https://www.youtube.com/watch?v=KF3jIUBecFo&t=0s) **Halloween AI News Panel** - In the Halloween episode of *Mixture of Experts*, host Tim Hoang and a panel of AI experts preview topics ranging from Anthropic’s TPU commitment and AI insurance to OpenAI’s blog on sensitive conversations and space‑based data centers, while also recapping recent headlines like OpenAI’s for‑profit restructure and Nvidia’s $5 trillion market cap. - [00:04:20](https://www.youtube.com/watch?v=KF3jIUBecFo&t=260s) **Chip Platform Diversity and Inference Shift** - The speakers explain that while transitioning between hardware platforms has become easier, the move toward inference‑heavy workloads drives companies like Anthropic to explore multiple chips for scaling, yet they likely stick with Nvidia for training due to entrenched CUDA optimizations. - [00:07:31](https://www.youtube.com/watch?v=KF3jIUBecFo&t=451s) **GPU Tensor Ops Landscape** - The speaker explains how NVIDIA’s CUDA‑first innovations dominate cutting‑edge tensor operations, while other APIs like Vulkan and OpenCL are catching up, leading to hardware/software parity that lets most models run efficiently on any platform. - [00:10:39](https://www.youtube.com/watch?v=KF3jIUBecFo&t=639s) **Hardware Strategies: OpenAI vs Anthropic** - The speakers compare the use of specialized AI chips for custom models, noting OpenAI’s push toward its own hardware for long‑term AGI ambitions, while Anthropic remains focused on practical enterprise deployment without pursuing a dedicated chip. - [00:16:21](https://www.youtube.com/watch?v=KF3jIUBecFo&t=981s) **Trust Through Certifications And Insurance** - The speaker argues that reducing technical complexity into clear certifications and insurance guarantees is essential for consumers to confidently adopt AI and other advanced systems, driving market adoption through monetary incentives. - [00:21:41](https://www.youtube.com/watch?v=KF3jIUBecFo&t=1301s) **Insurance Regulation vs Chinese Competition** - The speaker questions whether stringent US insurance standards will disadvantage domestic firms against faster, less‑regulated Chinese providers and asks a bullish colleague if they are more optimistic about near‑term capacity growth. - [00:25:16](https://www.youtube.com/watch?v=KF3jIUBecFo&t=1516s) **Why General AI Insurance Fails** - The speaker argues that blanket insurance policies for AI models are naive, stressing that risk management must be use‑case specific and that proposals for superintelligence‑level insurance are unrealistic. - [00:29:19](https://www.youtube.com/watch?v=KF3jIUBecFo&t=1759s) **Rare Event Safety Dilemma** - The speaker highlights how the scarcity of real‑world mental‑health emergency cases makes traditional ML safety training impractical, prompting reliance on expert panels and simulated evaluations, and questions whether this will become the industry norm amid calls for fewer guardrails. - [00:33:23](https://www.youtube.com/watch?v=KF3jIUBecFo&t=2003s) **Assessing Trust in OpenAI Messaging** - The speakers critique OpenAI’s vague public statements, highlighting the difficulty of gauging real technical progress versus superficial “CYA” messaging and the resulting challenges in establishing user trust. - [00:36:41](https://www.youtube.com/watch?v=KF3jIUBecFo&t=2201s) **Ethical Concerns of AI Emotional Support** - A speaker reflects on users' emotional reliance on AI, acknowledging its utility when human help isn’t feasible while expressing skepticism about whether AI truly guarantees people receive appropriate professional assistance. - [00:41:43](https://www.youtube.com/watch?v=KF3jIUBecFo&t=2503s) **AI Companionship Safety & Space Data Centers** - The speaker critiques reactive safety “bumper‑rail” measures for AI companions, preferring built‑in safe design, then shifts to present Nvidia‑backed StarCloud’s bold claim that future data centers will be placed in orbit to exploit vacuum cooling and solar power, inviting reaction. - [00:45:01](https://www.youtube.com/watch?v=KF3jIUBecFo&t=2701s) **Space Hardware, Politics, and Dreams** - The speakers weigh the technical hurdles of upgrading orbital equipment against looming debris, regulatory, and geopolitical challenges, while expressing enthusiasm for the futuristic concept. ## Full Transcript
You know, the advantages are compelling, right? Virtually unlimited energy,
virtually unlimited cooling. What's not to love? And if they
can make the tech work, awesome. And then of course
I came down and thought, huh, the maintenance of this
sounds just like an absolute nightmare. All that and more
on today's Mixture of Experts. I'm Tim Hoang and welcome
to the Halloween episode of Mixture of Experts. Each week
Moe brings together panel of brilliant, funny and somewhat spooky
panelists to distill down what's important in the latest news
in artificial intelligence. Joining us today are three incredible panelists.
So a very warm welcome to Gabe Goodheart who is
chief architect AI Open Innovation, Chris Hay, who is a
distinguished engineer, and Kate Soule who's Director of Technical Product
Management for Granite. Lots of topics. Today we're going to
talk a little bit about anthropic's commitment to TPUs, a
little bit about AI insurance, some interesting blog posts out
of OpenAI on sensitive conversations and finally data centers in
space. But first we've got Illy with the news. Hey
everyone, I'm Illy McConnell, a tech news writer for IBM
Think. I'm here with a few AI headlines you might
have missed this week. OpenAI has restructured, becoming a more
traditional for profit company. This move gives Microsoft a whopping
$135 billion share in OpenAI. Nvidia has become the first
company to reach a $5 trillion market valuation power chip
business. To put this in perspective, Nvidia is now worth
about twice as much as JPMorgan Chase, Walmart, ExxonMobil and
Johnson and Johnson combined. Amazon has unveiled AI powered smart
glasses for delivery drivers. The glasses guide the drivers as
they walk with directions and alerts to identify any hazards
along their way, all so they don't need to look
down to check their phones. Halloween can be scary, especially
for your dog. A Toronto based digital agency has created
a playlist that combines AI and canine sound science to
help dogs stay calm through the Halloween noise and excitement.
Want to dive deeper into some of these topics? Subscribe
to the Think newsletter linked in the show notes. Now
back to the episode. First, I really want to start
with another kind of massive blog post from Anthropic that
came out this week. I'll just quote it, but it
basically is their announcement that they're going to be increasing
their work with Google Cloud and specifically their sort of
Google's TPU kind of AI chip. So they say quote.
Today we are announcing that we plan to expand our
use of Google Cloud technologies, including up to 1 million
TPUs dramatically increasing our compute resources as we continue to
push the boundaries of AI research and product development. The
expansion is worth tens of billions of dollars and is
expected to bring well over a gigawatt of capacity online
in 2026. So like 12 months from now, basically. So
this is another blog post where, you know, every day
like there's another post where you're like, the numbers are
just mind boggling. Chris, do you want to give an
intuition for like why, why is Anthropic going big with
TPUs after I guess happily working with Nvidia and also
Amazon as well for years? At this point I think
they just want to push Nvidia's net worth down from
5 trillion to 4.99 trillion. I think a few little
digits out. Of there, that's their motivation now. I think
it actually makes a lot of sense. Right. Which is
the reality is that they are on multiple clouds, they're
on aws, they are on Google, et cetera. And if
you think about the GPU, you know how hard it
is to get GPUs these days. Actually being able to
just sort of diversify your stock a little bit there,
I think is a really smart strategy and therefore you
can get the best cost for inference. You can run
across multiple clouds, you can use Google infrastructure. So I
think it's a, a very, very smart strategy. Now technically
I think it makes things harder because you're not depending,
you know, you're not being able to take advantage of
things like the CUDA performance improvements. And you need to
go and find these things and run different architectures to
support these chips. So they're making life difficult for themselves.
But I totally understand why diversifying is a good thing.
Yeah. And I kind of was interested in, I mean
my instinct is like operating in each one of these
chip platforms is kind of its own beast. And so
almost like what Anthropic is saying is like we need
chips so much that we're willing to like deal with
all of that operational complexity. Is that the right way
of thinking about it or is like actually it's like
a lot less complicated than it used to be in
terms of moving across platforms. I think it certainly probably
is less complicated than it was two years ago. But
I think what it also is reflecting is the compute
needs are continuing to shift from what used to be
very training heavy workloads for these providers to inference heavy
workloads for these providers, where it's a lot easier to
get your model to run on these other chips than
it is to train on these other chips. And so
from that perspective it makes a lot of sense as
Anthropic's looking to scale their deployments and as using reasoning
models and other kind of what we call test time
compute approaches that continue to boost up their inference costs
to find newer, cheaper ways to scale that inferencing. My
guess is that they're still going to use Nvidia to
train. Okay, why is that? For some of the same
reasons Chris stated with Cuda and all of the like
optimizations that they've undoubtedly sunk into their Nvidia based stack.
Yeah, it's almost like they've, almost like they've invested already.
So why would you start from kind of like start
again in some ways? Yeah, and I think that's just
like Nvidia's cash cow. Like that's what they have really
optimized for some of these most advanced GPUs they're building
TPUs and trainium and other chips. Yeah, I did want
to talk about that Gabe, a little bit because my
understanding is, I mean the move to a TPU is
distinct. Right. It's actually not a GPU and you know,
my understanding is it's a, it's an asic. Right. It's
sort of like a chip that's kind of designed specifically
for AI applications. Applications. And I think for a long
time people have been like, oh well, you know, the
GPU is kind of like a historical accident and we're
eventually going to move to the world of like a
true AI chip from first principles basically. And so I
don't know if this is like, I mean 1 million
TPUs is a lot of TPUs. It feels like maybe
we're kind of finally crossing that threshold into like a
world of much more specialized chips. But I don't know.
I'm curious about what you think about that. Yeah, I
mean just to pick on the point about Cuda and
the sort of the incumbency there, there's both the incumbency
at the hardware, but actually where it's really sticky is
at the all of the kernels that their engineers have
spent hours, days, months, weeks, years tuning to be perfectly
aligned with that hardware. And my guess is that what
they're going to do is put all of their older
models on the TPUs, where the non Cuda code ecosystem
has caught up on the kernel implementations and they're going
to keep driving the novel architectures because basically each one
of these models is just a collection of Tensor ops.
But each one of those has to be carefully tuned
for the stride and the batching and the simd chunking
of how you actually run this giant pile of math.
And at the end of the day it's a bunch
of adds, subtracts, mutates. Bit shifts, and at the end
of. The complicated, the low level gorpy crap that you
can't read because it doesn't make any sense until you
finally ingest your head into that parallel view of a
grid. But they're going to keep innovating in Nvidia to
get the latest and greatest. If they're implementing some novel
architecture, it's almost certainly going to be CUDA first. But
at this point many of the other driver packages like
Vulcan and maybe to a lesser degree OpenCL and others
are starting to catch up on performance for some of
these well understood tensor ops. And you've probably got most
of your less innovative model architectures poised and ready to
go to just run on any old platform you can
because it's reached parity across the different, you know, software
driver layers. So I think that's really smart in that
sense. If they can offload the sort of well trodden
path to cheaper, more efficient hardware and keep the expensive
hardware for the cutting edge, that's going to give them
a nice sort of twofold advantage. And nowadays, I mean,
I think originally all of these tensor ops that were
powering these novel architectures were new and so the platforms
hadn't caught up across the board. But now that we're
a few years or AI decades into this space, AI
centuries basically. Exactly. It's really started to level out a
bit. And we'll probably see that in general across alternate
hardware that Nvidia leads the way on the novel architectures
and alternate hardware sort of picks up the broad breadth
for efficiency plays. Yeah, for sure. And I think that's
one thing that occurs to me and Chris. Yeah, I
was actually about to just kind of turn to you.
I mean you had a joke at the very beginning
where they're like, ah, maybe we'll just like knock some
hundreds of millions of dollars off of like Nvidia's market
cap. And I think we often talk in terms of
like, oh, who's going to take on Nvidia? But this
is almost like there's like kind of room for everybody.
It sort of seems like here where like basically there's
like a role for Nvidia to play. But like, I
mean that capacity that you need to just do inference
on, well Understood models is huge. And so it's kind
of like maybe a world where like in the future
it really won't, you know, like it will basically be
like all the major model providers kind of are doing
what Anthropic is doing here. Do you agree with that?
I agree. And let's face it, Anthropic needs that capacity.
Anyone who's ever tried to use the CLAUDE model at
midnight UK time will understand that the API limits, capacity,
limits reach, come back later. You know, if they're going
to get a million TPUs, I'm fine with that. Just,
just, just don't deprive me of my precious. It's a
move specifically. Exactly. So, so go for it, Anthropic. And,
and so I think, I think it is necessary and
I think that diversification is, is good. So I'm happy
with that. I'm not surprised though. I mean, I think
I've said this before, but I'll say it again, like,
it's like the, the following trend of bitcoin is just
hilarious, right? You know, Bitcoin started with CPUs, the, then
they went to GPUs, and then they went to FPGAs,
and then they went to Asics, and guess what? We're
AIs following the exact same. And it makes sense, right?
Which is if you've got custom models that are always
doing roughly the same thing and you can get faster,
less general specific chips that do that job cheaper, then
go for it. And we've already seen that play out
very well. If you look at the GROK chips, for
example, then they run incredibly fast. So I'm all for
it. Just so I can. And so I can play
with Claude at midnight. That's all I need. Please let
Claude be accessible. That's really what I want. Maybe kind
of a final question here, Kate. So we've kind of,
in the context that this has come up in for
OpenAI, we've talked a lot about them ultimately going like,
kind of more vertically integrated. Right. There's a lot of
rumors about the OpenAI chip and what are they doing
with the OpenAI chip and what it will look like
and all that sort of stuff. We're seeing that, I
think, at least to my recollection, unless you've heard otherwise,
like Anthropic seems to be doing that. Less to say,
like, oh, we need an anthropic chip and we're going
to really hype an anthropic chip. But I think both
are kind of pure competitors in a Certain way. Do
you think there's a reason why Anthropic is like not
really getting into the hardware game? I think Anthropic's position
has been far more focused on like meeting practical enterprise
deployment needs, where OpenAI is obviously on the pursuit of
super general intelligence at all costs. And so I think
from OpenAI's perspective, they might be playing a bit of
a longer game and looking at to get to truly
differentiated AGI style intelligence. Are there maybe even co optimizations
that need to be made between models and chips all
the way down the stack? And how does that kind
of unfold more broadly? And I am not an AGI
believer person, so I very much more resonate towards Anthropic's
approach, which is regardless of AGI, what are the practical
use cases that can be solved today with AI and
how do I scale up my demand to meet it?
So I think they're really like, it gets to very
philosophical differences of how these companies are pursuing innovation. Well,
that AI skepticism is good for, I think what will
be our next topic that I want to move us
on to. So I think it's become almost like a
little bit of a joke, which is like when you
want something passed around and discussed in AI land, you
launch a freestanding website that just has your essay on
it. And this past week was no exception. So this
essay, I think that came out, I think maybe a
few weeks ago, called Underwriting Superintelligence from a number of
researchers and then a guy who runs a company, this
guy Rune, who runs a company on ensuring kind of
AI platforms. And the subject of the essay of Underwriting
Superintelligence is really about kind of like the role of
insurance in allowing new technologies to form. And one of
the things they focus on is what they call the
incentive flywheel. Basically the idea is like once you have
people insuring a new technology or a new space, they
tend to want to lower the risks and they tend
to want the people that they're insuring to do a
good job managing the risk. And they kind of describe
this sort of virtuous cycle whereby the insurance company says
you need to adopt certain standards. Those standards make it
possible to do audits, and then those audits simultaneously make
things safer and help price that insurance. And so I
thought this was kind of a fun idea. It's very
much couched in the world of like, well, we're about
to head into AGI world. But I think this is
a kind of like bigger, kind of interesting question about
insurance in the AI Space. And I guess, Gabe, I
kind of wanted to actually kick it to you first
because you do a lot of work in sort of
like open innovation, open models. I guess the question of
kind of like the risk to the open model provider,
is that something that you guys talk about in your
space at all? I'm about to launch a new open
source model. Do I have to be worried as the
provider of that model, of the liability, should something go
wrong with it going forwards? Curious if that's in your
world at all. Yeah, I would say it really depends
on how an open model is being positioned. I think
there are plenty of organizations that are just out there
trying to make something cool that people play with and
find some value in. I would say from our position
in IBM, I don't want to put words in your
mouth, Kate, but I'd say we actually do care quite
of baby steps in this direction around guaranteeing and adding
verifiable tracing through the training process to ensure that the
models are in fact meeting benchmarks for, you know, best
practices in training and security. So one of the things
that really struck me about this article was that, you
know, this is already happening a little bit piecemeal. But
I think the, the overall sort of framing of this
as an insurance problem that follows other sort of risk
sensitive markets and ecosystems like vehicles and fire and building
and just sort of infrastructure components was an interesting take
on it. You know, the, the, you know, the science
part of me always gets skeptical when these things talk
about, oh, just throw evals at the problem and everything
gets solved. Because evals at the problem is, you know,
an undefined set of words. However, then, you know, I
think about if I were an automotive engineer and you
know, I'd probably have exactly the same skepticism about various
different safety standards. I, you know, I used to work
in defense and there were all kinds of hoops you
had to jump through. And the old joke was like,
just don't try to apply logic to security, just check
the boxes, right? And it's really frustrating as an engineer.
However, this sort of dumbing down of the complexity space
is almost required in order to make that complexity space
manageable for folks outside of the deepest sort of knowledge
bases of that complexity space. So whether it's AI, you
know, defense systems, cars, probably even, you know, fire prevention,
there's probably a ton of nuance that I have no
idea how it works. And I just look for, you
know, fire safety certifications before buying, you know, a bed
for my child kind of thing. So there's some really
know, getting the flywheel off the ground, so to speak.
And obviously this is a little bit motivated by the
author's vested interest in trying to be the insurance company
for this marketplace. But I think the key here is
for consumers to put their money where the certifications lie.
Right. It has to basically start with where the money
flows. That's what gets the flywheel turning. And, you know,
no one's going to pay for insurance for their model
if it doesn't actually boost their bottom line and get
more consumers to use their model. So I. It'll be
really interesting to see. And that sort of gets back
to that. How do we dumb down the complexity space
enough that it becomes actually meaningful to consumers to say,
I'm going to pick this model whether it's open or
closed, because it's got this certification and I know I
can trust it, or because it's indemnified in this way.
And I know if it does something terrible, I actually
have somebody I can sue and get, you know, recourse
for out of the model provider. Yeah. And I think
that's kind of the question that I felt was like,
maybe one of the things lurking behind and maybe, Kate,
I'll throw it to you because it sounds like, you
know, you guys are already working on some things that
are shaped a little bit like this, but, like, kind
of, it feels like before you get to insurance, people
are going to just be like, can you certify to
certain things? Right. Like, can you guarantee certain things, you
know, regardless of whether or not you're going to like,
pay me if something goes wrong? And so I guess
there's a world where this evolves where you might not
ever need kind of third party insurance, potentially at least
in like, most cases, because most people are just being
like, yeah, do you meet certain standards? Okay, if you've
met certain standards, then I'm happy to adopt this. Right.
And then it kind of like gave to your point,
like, the businesses are happy because they're getting more business
as a result. But I guess, Kay, it sounds like
maybe you all at Granite are kind of thinking about
some of this stuff in piecemeal. I think understanding and
mitigating risk has certainly been at the core of our
strategy with Granite from the beginning. So, for example, we
made a lot of decisions very early on to make
sure and take preventative measures to prohibit known pirated content,
for example. That has been the subject of many lawsuits
in the US from being used in training. We're very
open and transparent about the data that we're using, which
is really just a testament to how careful we're being
about data selection. And I think that has ultimately led
to us continuing to work with different standards bodies to
get certification for that. Steps to help our educate our
customers and users on the variety that exists in model
development and why aspects are important. So Granite is the
first open source model family that was developed according to
ISO 42001 standards, which is really exciting. The only other
model developer, there's plenty of model providers but the only
other model developer I'm aware of with those standards is
Anthropics Claude. So not many providers have gone after that
certification but I think it is starting to grow in
terms of its prevalence in the at least US based
markets. I struggle though with how insurance kind of plays
a role. I think there's a couple of different things
going on. So one like there are examples in the
article even cites it of insurance for for example for
a copyright protection. So Copilot came out first saying that
you know, we'll insure and identify real identification, indemnify against,
you know, if the model produces copyrighted output, you use
that and your products will protect you. I worry that
there will be the best providers to understand the true
risks. Won't be the insurance companies, it'll be the model
developers themselves. Just because this technology is so new. They
had somewhere in the article citing that there's only like
100 researchers who are qualified to do model audits in
the United States. There are very few people who understand
the technology to the detailed level needed to assess that
risk as well as very few companies open and transparent
enough and you know, kind of governing the development in
a way that that risk can be well tracked and
understood. So I, I don't see a great opportunity for
third party insurance providers to come in and kind of
buffer against. It'll be like first party insurance basically. Yeah,
well you're going to use our model and if you
get into trouble it'll be like the copyright thing. IBM
does that as well. Right. So we provide indemnification for
our models when they're used Granite models when they're used
their Watson X product lines because we understand better than
anyone else could, you know, the extent to how these
models were trained and the risk that they can create.
So I think that's going to potentially prevent third party
insurance providers coming in. I also had some questions that
this article raised. It does discuss the importance of moving
fast and being competitive against China. But insurance by definition
is really not a global, global market like insurance. You
always have a population that is similar in some characteristics
that pull their risk together and provide protection. And so
if all of these Chinese model providers and developers don't
have this kind of standards and, or insurance policies are
going to keep moving probably with less regulation than US
based ones, they're going to move a lot faster. And
that's just going to disincentivize U.S. companies from operating under
the same constraints in order to stay competitive. And I
didn't think the article had a really great solution to
that. You know, I don't see that going away and
I don't think insurance is going to like allow us
to compete more competitively in that sense. So, you know,
it raised for me more questions than I think it
answered ultimately. One thing I guess, Chris, I'll bring you
into this conversation because I think in previous MOEs you've
tended to be our most like, not that you're like
a super intelligence believer, but you tend to be kind
of the most bullish on capacities getting really, really fantastical
I guess in the near term. Do you believe a
little bit more in that? Because it feels like your
calculations kind of change if you believe that. Oh yeah,
we're about to see these AIs do incredible things that
are very high risk that maybe the first party model
providers are not going to be able to self insure
essentially. And so maybe there's kind of pressure. If your
model is going to be used for some crazy, you
know, DNA printing thing that has like a huge biorisk,
you might eventually want to kind of like outsource that
to like a third party insurance market. Were you a
little bit more sympathetic to this article or were you
kind of similarly skeptical like Kate and Gabe? Insurance leads
to clauses, clauses lead to gray areas, gray areas lead
to lawyers, and lawyers lead to the dark side. That's
my, my opinion on this one. So I, I am
not a fan of this. The world does not need
more lawyers. Because of, because of the lawyers. Basically. Yeah,
no, I don't want this. And you say the model
providers can't. Who are the model providers? They are all
billion trillion dollar companies. If a trillion dollar company can't
afford to insure. Right. Some of the richest companies in
the world can't afford to insure their models. Who do
they think are going to insure These models, it's like,
what planet are we living on? And to your point,
Kate and Gabe, right, which is, do you think suddenly
OpenAI anthropic are going to go, please come in, I
want to show you the inner details of my model.
Please, please, please, please. And then you can write it
up in your insurance, et cetera. No, and we know
what the clauses are going to be like. You must
do this in a safe and secure way. You must
not prompt inject. You must not do this. And it
is always going to come back to being your fault
of why the model went wrong. It's never going to
be my fault. You'll be like, oh, you didn't put
a safety rail in there. I didn't know that was
I meant to do that. It's like you never, I,
it's always going to be my fault. And what am
I going to be doing? I'm going to be handing
them over money and then contact centers are going to
bring me up. It's like, do you want me to
ensure your ChatGPT instance? You know, that's going to be
$300 a month, grandma, you know, and I'm going to
be like, no, this is not an industry I want.
Insurance is bad. Okay, I think there's one other thing.
Yeah, I think there's one other thing. You got the
point on that one. This article gets wrong or misses
the point on entirely, which is so much of AI
based risk is use case specific. So having insurance policy
for general model deployments and for general models themselves is
I think, just very naive. If you're talking about, okay,
here's a very specific biomedical application where we need X,
Y and Z regulations to ensure consumer safety and health.
That has existed forever. Not forever, that has existed for
a while now and is really important and should continue
to exist and expand to consider these new AI based
ones. But these kind of like global policies, which again
is really based off of the entire premise of this
article is we're going to have artificial general intelligence and
superintelligence. And so therefore we need superintelligence Insurance, I think
is really just not practical or realistic of what the
real risks are and what needs to be solved for
in the near term. Well, and Kate, to your point,
you know, I think there's a lot of debate about
what superintelligence actually means. You could achieve superintelligence in a
specific domain vertical where the machine can do better than
top individuals trying to solve that problem. That's a, that
might be a great, very specific slice of a market
for an insurance policy, however, sort of the AI stop
at those two letters, ensure that thing. I agree. It's
really wide open. And you're going to basically just get
back around to let's trust the model creators because they're
the ones that actually understand these problems, that they've done
it correctly. So it's going to. It's a chain of
trust without a root certificate. Right. And to that point,
Gabe, it's lazy offloading, right? Because to your earlier point,
it's like, okay, you're providing a product and service, and
that service has got to be compliant with whatever regulation
does it do. The thing that is your responsibility as
the product to decide whether the AI is suitable and
you've tested enough to be in your product. By ensuring
the model, you're just offloading your risk, which I guess
people want to do. And it's lazy. You know what
I mean? It's lazy. So I get why you want
to do it. And if the model developers want to
ensure that risk, that's fine. And I can imagine that
many do for various reasons. But I assure you, when
it goes wrong and you're looking at your credits for
that month and go, oh, I got my credits back
for that month because it went wrong, I don't think
that's going to cover your risk. So that's kind of
my issue on that. All right, Moe. Skeptical of superintelligence
and just hates lawyers. I'm not skeptical of superintelligence. I'm
still right on for. Yeah, you're skeptical of insurance. I
just don't like insurance or lawyers. Well, I'm going to
move us on to our next topic, which actually, I
think, like, I wanted to put these two next to
one another because I think they. They sort of rhyme
in interesting ways. So we just talked a little bit
about, you know, I think an essay that was like,
very speculative about, like, superintelligence and how you might insure
against those risks. And in some ways, like, the next
topic that we're going to talk about is also about
companies managing their risk, but in like, a very, very
different, different context. And I thought it was kind of
like a fun to kind of put them next to
one another. OpenAI did a blog post talking a little
bit about how they're working to make sure that ChatGPT
can navigate sensitive conversations with people well and safely. And
it's a really super interesting post that kind of goes
through sort of like their philosophy of how they attack
these problems. Some of the data that they've seen from
GPT5 and overall kind of like how they're sort of
managing this from a product standpoint. And you know, to
our conversation earlier, I think it's like a good example
of maybe an alternative path about how companies will manage
some of the risk here, which is, I think what
OpenAI is doing here is just sort of showing the
work. They say, okay, you're worried about this. Here's the
evaluations we've done, here's how we think about it, and
you have to make your decision on whether or not
you want to use this product. And I think, you
know, Kate, maybe I can kick it over to you.
I think the first sort of really interesting problem is
just from a product standpoint, it's really hard to do
safety in this space because the number of actual real
world cases is so rare. So they point out that
basically, you know, they say across these kind of sensitive
conversations, and there's a question about how you define that.
They say their initial analysis estimates only about 0.07% of
users active in a given week and 0.01% of messages
indicate possible signs of mental health emergencies related to psychosis
or mania. And that's kind of an interesting problem, right,
because I think the usual machine learning thing is like,
well, can we collect some cases and fine tune against
that? But here you're kind of like, it's very, very
limited. And I guess mostly I'm just sort of interested
in sort of your take on if you kind of
buy their approach on how they're managing this, which is
we only have a few examples. So we're going to
rely on panels of experts, we're going to rely on
like simulated kind of evals. Like, is this the way
that you think kind of like is going to become
sort of the industry method for kind of attacking these
types of problems? I mean, I like read these articles
and then in the back of my head I'm like,
didn't Sam Altman just say we're going to let it
treat adults like adults and let erotica on our platform
and like get rid of guardrails and safety altogether? I
was going to ask about that as well. So if
you want to take that question also, it's like, what's
going on here with this split screen? Seems, yeah, very
discordant at least to. But look, I think they are
certainly it's good that they're engaging and they're being more
open about work that they're taking with these kind of
like taxonomy based approaches which are Very similar to ones
that we use at IBM to map out and define
safety issues, particularly around mental health. And that they're engaging
with like subject matter experts and clinicians in the area
to figure out best ways to respond to that doesn't
mean though, that they're like, I don't know that they
still have the right incentives to fully solve this problem.
And you'll notice a lot of things about what they
say are very vague. So 0.01%, first of all, can
still be an amazing amount of data on potentially suspect
mental health conversations. If we're talking about ChatGPT usage and
they talk a lot about, oh, we've had, you know,
a 65% reduction in harmful responses, but they don't tell
you the percentage of harmful responses they started out with.
So Is this a 65% reduction where 50% of the
time the model was saying something awful or 1% of
the time the model was saying something awful and now
it's down to, you know, 0.35. So I think there's
still being, I think the lawyers got in there, Chris,
and definitely heavily post edited that release to try and
make this like, okay, we're gonna be open, we're gonna
start about it, but we're not going to tell you
too much. Is it right to say you're kind of
skeptical? This maybe seems to indicate more than it actually
does. It's not necessarily a bad thing. But I am
certainly skeptical that they're doing the full extent that they
can to solve the problem or have the incentives to
solve it and to really take this as seriously as
it needs to be taken. I also worry that you
called this very much like a product based strategy. And
so at the product level they could be doing a
lot of work that then doesn't get addressed at the
lower level like endpoints that the rest of the world
is building their own products off of. And so how
that translates to where we're now having advertisements being sold
to consumers who are having mental health issues and breaks
and problems, and how that translates to other broader applications
I think is concerning and not really addressed. So I
think there's just a lot more work to be done
ultimately. Yeah, I mean, it kind of comes back to
the chain of trust question, right? If you as a
consumer have decided that OpenAI is a company you trust,
this is a great article. It's like, man, look at
all this great stuff. I'm leaning into trusting that they
are, you know, really trying to do their best. Right?
They've got a lot of verbiage about working hard and
doing their best and that's, that's, that's great. If you
are not not already founded on us a root of
trust in OpenAI. There's a whole lot of holes to
poke in this article. Right? There's lots left to your
imagination to fill in. Exactly. As Kate said, what does
an actual positive or negative outcome mean here? Right. It's
a really gray space and to the previous conversation, it's
really hard to establish trust here. This is, you know,
the realm of non Gaussian statistics. Right. We're off in
the tail end of the distribution and it's hard, it's
hard math, it's hard science. So you know, I fall
probably somewhere in the middle of the trust spectrum. I
think the fact that they're putting this out there means
at least some real work was probably done and is
probably making some positive improvements. But juxtaposed against the public
messaging, it's still really hard to know, you know, exactly
where this falls and how much. This is a sort
of COVID your sorry CYA blog post versus post. Yeah,
exactly. Versus A, you know, a real sign of technical
improvements. And we also know that any company of reasonable
size, the left hand doesn't really talk to the right
hand very well, especially if the right hand is a
public figure at the top of the company. And so
it's very possible that there are chunks within the OpenAI
is then running into the buzzsaw of product decision making
and public messaging and, and all of that that makes
it hard to get the science in the right place.
So it's probably a much more complicated picture behind the
curtain than this blog post is is presenting. I'm glad
that it they're at least putting a foot forward but
I'm still, you know, very guarded in my trust of
them as an organization and any organization frankly that's self
certifying that it's doing the right thing for sure. And
I think this is like Aaron, kind of the problem,
right? Which is like I think on the last conversation
we said well the more realistic outcome is that these
companies are going to self certify. I think now where
it's a case where we're talking self certification and we're
like is this the kind of certification we really want
is kind of where you're kind of stuck between two
maybe not so great scenarios, I guess. Chris, I think
one of the things that really struck me about this
piece outside of the data was it's also OpenAI starting
to show their point of view on how these types
of technologies should respond in certain types of situations. So
the example I'm thinking a little bit about is they
said, well, okay, in cases where we detect that the
user is becoming emotionally reliant on the model, then our
intervention is that the model should tell them to have
more real world conversations, which is a very specific kind
of opinion about what the model should be doing in
that context. And I guess you can just kind of
take that case. I was kind of curious if you
think that is actually the desired behavior for these technologies,
if they detect you are becoming emotionally reliable or emotionally
reliant, that they should say, hey, have you considered talking
to other people in the real world? Is that effective?
From your point of view, do you think that this
is a good way for these models to approach, you
know, these situations? I am, yeah. I am the last
person you should be asking these questions, Tim. I think
as someone who's very emotionally reliant on Clyde. Yeah, exactly.
I, I think it's, I think it's difficult, right? I
think first of all, I mean, To Kate's point, 800
million users, 0.07% of the, you know, that's a huge
number. Do you know what I mean? That's a Friday
number. But I think people who are relying on AI
and I don't really know in that sense, I rely
on Claude, but not for emotional stuff. But I mean,
I think the reality is maybe they're in a scenario
where going and speaking to a human isn't an option,
do you know what I mean? And maybe they're relying
on the AI for those very reasons because it's a
very difficult subject and going to speak to a human
about that is going to be difficult for them. So
I understand the sort of things that they're doing there
and I think people need help wherever they need help,
whether it comes from an AI, whether it comes from
another human. And you know, and I think the good
thing they're doing is they're speaking to professionals who are
telling them what the right things to say. So I
think I applaud all of that. I guess the skeptic
in me is just like, I hope that they really
want people to get the right help that they need.
And it's not, not off setting the blame and say,
well, you know, they contacted us and we said, go
speak to human being. And they didn't do that. And
again, and those pesky lawyers are back again. So I
don't know, but I don't think there's a good answer
in these scenarios. And I just, I hate to say
I think it's going to get worse. Right. Because at
the moment we're chatting with ChatGPT and others as a
tax mode, of course we have the voice mode, especially
when you're driving and things like that. But as more
modalities come more and more you've got the more realistic
looking avatars, et cetera, and those technologies progress, it's going
to start to feel more human, like. And I think
these problems are going to push more and more. So
I don't know what the answer is, but I do
applaud them for trying, I think. Yeah. And on the.
Applaud them for trying, there's actually. It was a corollary
piece, I think it launched yesterday, maybe the day before.
They just actually published previews of fine tuned versions of
their open weights models that are designed around policy enforcement.
So very similar to the Granite Guardian series where the
user provides an actual policy that they want to check
whether the input or output adheres to and then it'll
do sort of real time on the fly evaluation against
that policy. So actually I thought that was maybe the
most interesting part. I suspect that the timing is not
coincidental, although it may be. But if they're actually making,
making a real stride towards putting solutions to some of
these problems out in the open, I think that's actually
really encouraging. So I'm very curious to check those out.
You know, one thing I noticed that was slightly different
than how we framed the Guardian models is that our
Guardian models are specifically tuned towards a collection of policies
that we have data for. Whereas it sounds like what
they've built is a general purpose policy evaluation model that
arbitrary policy evaluations. Thank you, Kate. I'm sorry. Yes, I
misrepresented. All right, so arbitrary policy is. Well in the
Guardian series and they're smaller and we should try those
first. Yeah. The point being, you know, putting this together
as a system is an actual good technical step forward
that others can, you know, put their hands on and
feel and build some of their own trust in the
technology versus just reading the blog posts. So I think
that's an encouraging sign that there's some real tech being
released in the open to put some weight behind these
assertions. To me it really feels almost like a fundamental
philosophical question, should AI be used for companionship. And I
think, to Chris's point, there are real scenarios where AI
as a companion can provide benefits. But the way if
that's the case, if we want to be able to
that provide value needs to be a lot more core
and fundamental in how these models and chatbots are being
built. I worry that OpenAI's approach so far, it's great,
like I said, they're doing better things and it's great
that they're talking about it publicly. But it includes stuff
like I noticed in their article that they route to
a safe model if you're in distress, and if not,
then you get to use the, you know, treat adults
like adults model, presumably without guardrails and everything else. And
so I do think it comes a bit philosophical of,
like, how ingrained should these guardrails and safety protocols be?
And are you designing for, like, safe companionship up front,
or is this being tacked on at the end and
like, uh, oh, someone so and so has some red
flags. We're gonna to put them down the, you know,
safety mode bumper rails, like in a padded room so
that they don't hurt themselves, so to speak. Which I
think just feels a little bit, you know, unsafe ultimately,
but isn't how I would want a AI companionship that
my loved ones are talking to if they need support
or help. All right, I'm going to move us on
to our last topic. Just kind of a fun story
that kind of popped up. So Nvidia did a blog
post about a company that they're supporting called StarCloud. And
the premise of StarCloud is data centers in space. And
the CEO has this incredible quote which is, in 10
years, nearly all the new data centers will be built
in outer space. And so let me just kind of
lay out the sort of argument, the pitch of StarCloud,
and then we'd love to kind of get everybody's sort
of takes on it. The idea is, in space you
have a lot of benefits. One of them is cooling,
right? You don't need to use a lot of water
because you're in the vacuum of space. And the second
one is energy, right? You can also generate a lot
of effectively green energy through solar. And when I heard
this concept, I was like, this is a little wild
of an idea. And I guess maybe. Gabe, I'll kick
it over to you first. Are data centers headed to
space? We're doing all of this terrestrial Construction on data
centers. Is this a model where. Yeah, maybe these advantages
actually do encourage us to put stuff up in the
sky. So I went on a little rollercoaster ride while
I was reading this article. The first thing I thought
was like, this is absurd. I read the title and
then we got into the meat and potatoes of it
and sort of. I think the advantages are legitimate. These
are. You're a believer. Okay, cool. Well, I'm not at
the end of my rollercoaster ride. The advantages are compelling,
right? Virtually unlimited energy, virtually unlimited cooling, what's not to
love? And if they can make the tech work, awesome.
And then of course I came down and thought, huh,
shoot. Nvidia just launched a brand new card. I guess
I better scrap that one and start a whole new
data center. Or shoot. One of the devices is failing.
I need to send somebody over to the rack to
pull one out and put, ah, damn it, I can't
do that. So the maintenance of this sounds just like
an absolute nightmare. And one of the problem, one of
the projects I was adjacent to in my previous gag
in defense was actually around space object tracking, which is
really, really hard. It turns out there are a metric
something load of things floating around out there that are
just taken up space trajectory, orbiting the earth and are
really, really hard to track. And there's always the possibility
that they crash into something or fall to the ground
and cause major damage. So space is not just like
a happy little bubble up there that you can just
toss stuff into. It's a really, it's already a very,
very crowded physical space that has real implications. So putting
a technology that is obsoleted on a six month basis
into space, that's to me where I see the problem.
You know, the benefits are clear. That's great if we
can actually realize that. But how in the world are
you going to keep up with the pace of innovation
in the physical hardware that you're sticking up there? I
have no idea how that would be feasible. Yeah, I
love the idea that like you gotta replace the cards.
So we're gonna just ship a guy up there to
replace the cards. I mean it'd make for a real
cool movie. Part of me is like, wow, that is
just so cool. Yeah, I think it would be cool.
I would just be so cool. But I do wonder
about things like, you know, like on the side, somewhat
interested in astronomy and like space clutter and like it
is a real thing, the debris fields that are growing
and there is, I think it's going to be really
fascinating to see how some of the regulatory environments grow
up around that, and particularly geopolitical control over space, obviously,
is all very immature right now and something that will
uncharted territory. So, you know, I think there's a lot
of. I think it's more gonna be like the political
difficulties that will prevent this or potentially delay this from
becoming more of a reality than maybe even the physical
or practical. But, you know, at the end of the
day, there's part of me that's just like geeking out
a little bit about it. Like, yeah, that sounds awesome.
It's just cool. Chris, you've got the last word for
this episode. I'm looking forward to the Mark Rober video
where he attaches a GPU to a balloon and sends
it up to space and he beats everybody to the
punch. That's what I'm looking forward to. I mean, Gabe
and Kate covered it all. I think it's a stupid
idea. Go work on making GPUs smaller and better and
have the model smaller so we can enjoy them here
without much power usage. Don't make it space the problem
as well. This is a great episode. Everybody was so
spicy today. But that's all the time that we have
for today. So, Kate, Chris, Gabe, great to have you
on the show as always, and thanks to all you
listeners. If you enjoyed what you heard, you can get
us on Apple Podcasts, Spotify and podcast platforms everywhere. And
we'll see you next week on Mixture of Experts. Although
booing like that makes me feel as if I'm watching
the New York Giants at the moment and I'm just
sort of booming.