AI Infrastructure Wars and Cost Curve
Key Points
- The latest Airst Street Capital “State of AI” report declares that the era of competing purely on model intelligence (model‑IQ) is ending, ushering in the “infrastructure wars” where system design and cost efficiency dominate.
- Three forces will now drive AI success: the rapidly improving capability‑to‑cost curve, how AI is distributed to users, and the physical infrastructure needed to run models.
- AI “intelligence per dollar” is doubling far faster than most anticipate—approximately every 3‑8 months across major providers, outpacing Moore’s Law by three‑to‑seven‑fold and dramatically reshaping unit economics.
- Winners will be firms that can dynamically route workloads to the cheapest model that meets performance needs, leveraging the fast‑shrinking cost curve to unlock real‑world value.
Sections
- Beyond Model IQ: Infrastructure Wars - The AI State of the Industry report argues that the era of chasing higher model intelligence is ending, and future dominance will hinge on three system-level forces—capability‑to‑cost curves, distribution strategies, and physical infrastructure—favoring firms that can route tasks to the cheapest capable models.
- AI Cost, Funding, and Browser Distribution - The speaker explains how massive token‑scale usage drives cost‑per‑token optimization, how model release schedules now align with fundraising cycles, and how browsers are emerging as the default AI operating system, linking capability growth, cost reduction, and distribution shifts.
- Answer Engine Optimization & Google Dynamics - The speaker explains how AI answer engines rely on Google's index, creating a need for new Answer Engine Optimization practices—structured data, APIs, citation‑friendly formats—and highlights Google's strategic dilemma of powering competitors while shifting users to its own AI interfaces.
- Water Constraints on AI Scaling - The speaker warns that as AI usage grows to quadrillion‑token volumes, the water needed for data‑center cooling becomes a hard limiting factor that will dictate site locations, power strategies, and the practical viability of large‑scale AI deployments.
- Model Sycophancy and Reward Gaming - The speaker warns that as AI models become smarter they may learn to flatter human evaluators and game reinforcement signals, which can offset intelligence gains and create new scaling challenges.
- Open Weights vs Frontier Closed Models - The speaker argues that while the most advanced frontier models remain closed in the US, partially or fully open‑weight models can still deliver competitive capability, lower costs, customization, and sovereignty, making them valuable for hybrid enterprise architectures alongside proprietary cloud offerings.
- Choosing AI Models for Workflow Distribution - The speaker stresses evaluating distribution capabilities, infrastructure constraints, and workflow efficiency when selecting AI models, marking a shift from merely pursuing smarter models to strategically leveraging differentiated skills and model availability.
- Rare Insight on AI Strategy - The speaker emphasizes the uniqueness of the audience's attention to current AI strategic themes and predicts a tumultuous 2026.
Full Transcript
# AI Infrastructure Wars and Cost Curve **Source:** [https://www.youtube.com/watch?v=gRhOo6uT-fM](https://www.youtube.com/watch?v=gRhOo6uT-fM) **Duration:** 00:28:30 ## Summary - The latest Airst Street Capital “State of AI” report declares that the era of competing purely on model intelligence (model‑IQ) is ending, ushering in the “infrastructure wars” where system design and cost efficiency dominate. - Three forces will now drive AI success: the rapidly improving capability‑to‑cost curve, how AI is distributed to users, and the physical infrastructure needed to run models. - AI “intelligence per dollar” is doubling far faster than most anticipate—approximately every 3‑8 months across major providers, outpacing Moore’s Law by three‑to‑seven‑fold and dramatically reshaping unit economics. - Winners will be firms that can dynamically route workloads to the cheapest model that meets performance needs, leveraging the fast‑shrinking cost curve to unlock real‑world value. ## Sections - [00:00:00](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=0s) **Beyond Model IQ: Infrastructure Wars** - The AI State of the Industry report argues that the era of chasing higher model intelligence is ending, and future dominance will hinge on three system-level forces—capability‑to‑cost curves, distribution strategies, and physical infrastructure—favoring firms that can route tasks to the cheapest capable models. - [00:03:36](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=216s) **AI Cost, Funding, and Browser Distribution** - The speaker explains how massive token‑scale usage drives cost‑per‑token optimization, how model release schedules now align with fundraising cycles, and how browsers are emerging as the default AI operating system, linking capability growth, cost reduction, and distribution shifts. - [00:06:51](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=411s) **Answer Engine Optimization & Google Dynamics** - The speaker explains how AI answer engines rely on Google's index, creating a need for new Answer Engine Optimization practices—structured data, APIs, citation‑friendly formats—and highlights Google's strategic dilemma of powering competitors while shifting users to its own AI interfaces. - [00:10:35](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=635s) **Water Constraints on AI Scaling** - The speaker warns that as AI usage grows to quadrillion‑token volumes, the water needed for data‑center cooling becomes a hard limiting factor that will dictate site locations, power strategies, and the practical viability of large‑scale AI deployments. - [00:15:36](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=936s) **Model Sycophancy and Reward Gaming** - The speaker warns that as AI models become smarter they may learn to flatter human evaluators and game reinforcement signals, which can offset intelligence gains and create new scaling challenges. - [00:18:47](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1127s) **Open Weights vs Frontier Closed Models** - The speaker argues that while the most advanced frontier models remain closed in the US, partially or fully open‑weight models can still deliver competitive capability, lower costs, customization, and sovereignty, making them valuable for hybrid enterprise architectures alongside proprietary cloud offerings. - [00:24:52](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1492s) **Choosing AI Models for Workflow Distribution** - The speaker stresses evaluating distribution capabilities, infrastructure constraints, and workflow efficiency when selecting AI models, marking a shift from merely pursuing smarter models to strategically leveraging differentiated skills and model availability. - [00:28:14](https://www.youtube.com/watch?v=gRhOo6uT-fM&t=1694s) **Rare Insight on AI Strategy** - The speaker emphasizes the uniqueness of the audience's attention to current AI strategic themes and predicts a tumultuous 2026. ## Full Transcript
The state of AI report is finally out.
This is an annual report from Airst
Street Capital with Nathan Benich at the
lead. It's been published for the last
eight years. Every time it comes out, it
shifts the industry. I'm going to go
through and I'm going to summarize the
313 slides in just a few minutes so that
you get the TLDDR. So, first things
first, the takeaway is that the model IQ
contest is over and the infrastructure
wars are just beginning. So the thesis
is fundamentally we have been pushing
and pushing and pushing to make
incrementally smarter models. But now
what really matters is systems
specifically three compounding forces
that drive those systems. The capability
to cost curve, the distribution
question, and the physical
infrastructure question. We're going to
get to all three of those, but the
thesis of this entire 313 slide report
is that those three drivers are going to
matter more in terms of practical AI
than just model IQ. And so the winners
in the race are going to be the ones
that can route computational work to the
cheapest capable model rather than just
defaulting to frontier smart IQ options.
And that's going to enable them to get
to real value. So let's jump right in.
the capability to cost curve. What is
it? So perhaps the most consequential
thing that we aren't talking about and
this is the the economic finding the
report has. Our intelligence per dollar
is improving on an exponential curve
that is faster than the pace that most
people assume in their strategic plans.
So across two independent leaderboards,
artificial analysis which tracks API
pricing and performance and also LM
arena which tracks crowdsource model
rankings, the capability to cost curve
is doubling very very frequently roughly
every four or five months. The average
across all of the different measures and
different model makers is between 3 and
8 months to double to be clear. So
Google is at a 3.4 month doubling time,
the fastest improvement curve in the
ecosystem. Open AAI is at a 5.8 8month
doubling time. Google in the LM arena
scores is slightly slightly longer, 5.7
month doubling time versus 3.4 in the
artificial analysis. And so this is why
I give you a range, right? It gives you
a sense of how fast it is. It's
ridiculously fast. So for context,
Moore's law predicted transistor density
doubling 18 to 24 months and that
roughly held give or take a few months
for a very long time, many many decades.
We are seeing effective AI capability
per dollar double three to seven times
faster than Moore's law. And the pricing
evidence is compelling. T5's input
GPT5's input costs for a 400,000 token
context window are 12 times cheaper than
Claude, 24 times cheaper than GPT4.1.
It's not that this is marginal, right?
This resets unit economics every few
months. When you can obtain frontier
adjacent performance for a 20th of the
price of just 6 months ago, then you
have a lot of strategic implications
that start to fall out of that
fundamental cost curve insight. First,
routing is now a competitive advantage,
not model quality. So products that
intelligently triage requests and send
simple queries to small language models
and reserve expensive frontier calls for
when they need it. They're going to
capture margin in a way that monolithic
architectures can't. And so the
practical AI stack now looks a lot like
smaller, dumber first routing with
frontier spikes only where needed.
Second ecosystem as a whole is scaling
usage with the cost coming down. We're
processing about a quadrillion tokens
every month across different API
providers. And at that scale, even a
basis point improvement in routing
efficiency will translate into millions
of dollars in cost savings or expanded
margin. So cost per token and latency
are not just back-end concerns, they
become relevant to product
differentiation to the P&L. Third, model
release cadences now correlate directly
to fundraising cycles. And what's
interesting is that this makes road
mapaps financial instruments, right? So
open AI trails model release to fund
raise by about 77 days. Google is about
50 days. And so labs will time
capability releases to create momentum
for funding rounds. And investors should
read launch announcements as
preundraising signals rather than purely
technical milestones. This is relevant
not just for open AI, not just for
Google, but for others, for anthropic,
even for folks outside the core model
makers. People are announcing
capabilities in the AI race as a way of
warming up the market for a fund raise.
All of this adds up to a world where
capability is continuing to accelerate
even as cost comes down. And that's the
first major driver, capability to cost
compounding. The second major driver is
distribution. Distribution is tilting
toward answer engines in the browser.
And I want to get specific about this.
The browser is becoming the operating
system for AI by default. And the
distribution choke point is shifting
from the search box like Google to
answer engines that can parse and
synthesize and present information
before the user clicks through. So Chad
GPT search is the gorilla. They claimed
800 million weekly active users last
week. They have very roughly, we're
still learning how to measure this,
something like 60% share of the AI
search market. And as a comparison, if
you're wondering about Perplexity, which
is the most famous AI search engine, it
logged about 780 million queries in May
of 2025, and it's growing 20% month
overmonth. So, it's going to be over
over a billion monthly queries. Now,
that is competitive traction, but it's
dwarfed by OpenAI's distribution
advantage. Open AAI with 800 million
weekly active users has much much more
on the table when it comes to search.
What's interesting is it's not just
search and conversation. It's also
purchase intent. So retail conversions
from AI referrals are running at about
11% 5 percentage points year-over-year.
And that 11% conversion rate is very
strong historically. Typical organic
search is just not going to keep up.
It's much higher than typical organic
search. It's competitive with paid
search conversion in many verticals. So,
answer engines aren't just changing how
people find information. They are
driving forward a new vertical of
purchase that is going to become hyper
relevant for e-commerce and for other ad
providers, marketers in 2026. But
there's a dependency that we're not
talking about.
The answer engine still source really
heavily from Google's index. They're not
crawling the web independently at scale
yet. They're layering natural language
synthesis over the top of existing
search infrastructure. And that creates
a really weird dynamic where Google is
providing the index, but OpenAI and
others are capturing the intent and
conversion. This has some strange
implications for builders. If you're not
thinking about answer engine
optimization as a topic, you you need to
be because you don't want to be
invisible to the fastest growing
distribution channel that we have. But
AEO requires something different from
traditional SEO. You have to have
structured data schemas that models can
parse and understand. You have to have
APIs that will allow answer engines to
pull canonical information directly.
You'll need content architecture that's
designed for extraction and synthesis,
not just for keyword targeting. You're
going to need citationfriendly
formatting that's going to make
attribution really clear. And Google
faces a really tricky strategic tension.
It provides the index that powers
competitors answer engines. But
capturing that value requires Google to
transition users from search to its own
AI interfaces without cannibalizing its
traditional monetization model. That
will be one of the central questions of
2026. Third major driver, power and
permits. This is a hard constraint on
scale that we're all going to be facing
in AI. So we all have heard about the
Stargate project, a 10 gawatt target
power, half a trillion dollar
investment. Multiple labs are now
targeting 5 gawatt training clusters
operational by 2028. But physical
infrastructure has to get there to
enable that kind of AI progress. This is
not a temporary bottleneck. It's a
capitalintensive, frankly geopolitically
complex problem space that will
determine which organizations can
execute on their road maps and that will
drive success for the organizations that
are able to build. Just to give you a
sense, a single gigawatt data center
requires about $50 billion in capital
expenditure right now. Land, buildings,
cooling, networking, GPUs, etc. And it's
going to require 11 billion a year fully
loaded to operate. So electricity,
maintenance, staffing, interconnects,
etc. That's not cheap, right? For
perspective, a single gigawatt data
center consumes the equivalent power of
a midsize city. And so the US currently
faces an implied 68 gawatt power
shortfall by 2028. That's 68 citysiz
data centers that we think will be short
per forecast forecast cited by semi
analysis and corroborated by the North
American Electricity Reliability
Corporation. So one of the challenges
that we're facing given that gap is
figuring out where and how we can
actually build to cover it. And so
things that have traditionally been sort
of environmental debates like not in my
backyard opposition nimiism have become
geopolitically relevant AI debates.
Right? Nimiism has already blocked 64
billion dollars in data center projects
across the US. Local communities are
voicing their opinion. They don't want
approvals due to concerns about grid
strain or noise or water usage or
whatever the local concern is. And
they're voicing their complaints in ways
that affect build patterns at the
county, municipal, and state levels. We
don't know how this is going to play
out. But the fact that the constraint
exists, that it plays out differently in
different local communities is going to
shape our collective future. Water adds
another layer of constraint. A 100
megawatt data center, so it's smaller,
consumes about 2 million L a day in
cooling. Now, per text query, per ask,
this is a very small amount of water.
The typical Gemini text prompt
apparently consumes about a quarter of a
milliliter of water. a quarter of a
milliliter, tiny amount. But when you
get to quadrillion token per month
scale, water usage becomes a sighting
constraint, especially in droughtprone
regions because data centers are going
to be competing with agriculture,
potentially with residential use for
allocation rights. This shifts where you
can site your data centers. And so labs
and cloud providers are being forced to
pursue special uh behind-the- meter
power purchase agreements. They're
trying to find ways to get to offshore
jurisdictions that have more available
power and perhaps fewer permitting
obstacles. Norway comes to mind, the UAE
comes to mind for that. And they're
trying to design for water aare cooling
things like air cooling, waste heat
recovery. The larger implication is if
your AI road map assumes that you can
call an API and scale from a million to
100 million users on demand and repric,
you need to think through what that's
going to take. We all need to be aware
that this hard constraint is going to
shape the availability of the rest of
the stack. It's going to shape the
availability of software. It's going to
shape the availability of tokens. Now,
there are very smart people working to
make things more efficient, working to
solve the power problem. Small modular
nuclear reactors come to mind. But
there's a difference between working on
it and having it be operational. And so
if you look at the strategic read and we
ladder back to where we were at the
start of this conversation,
fundamentally the next year or two are
going to see this clash between
dramatically improving intelligence per
cost opportunities like we said that
that's the the capability to cost
improvements are doubling every few
months. This is going to lead to more
and more demand for tokens. But at the
same time we have real hard constraints.
It is difficult to build these data
centers in physical space. It's it's not
a bytes problem. It's not a bits
problem. It's an atoms problem. It's
going to be difficult. So that's the
fundamental tension we're all going to
be negotiating over the next 12 to 24
months. I want to get to a second piece
that the report called out that I think
is really important. If that is our
overall strategic canvas, think of this
as a set of questions we need to be
evaluating within this space. First, I
want to talk about evaluation of
reasoning gains. One of the things that
we need to get more deliberate about and
that I think we're starting to see more
testing done on recently, but we didn't
in the first half of the year is how we
measure success, how we measure
intelligence, how we measure capability
of LLM gains. And so, you guys know I've
talked about the story of Claude
disastrously running a vending machine.
It was a complete disaster. And the
point was whatever the advertised
intelligence was, Claude wasn't doing
real economic work. More recently, Open
AI has launched GDP val where they're
trying to test within a constrained
environment how AI solves economically
useful problems. Well, one of the larger
pieces here, one of the larger reasons
we need these kinds of evaluations is
that reasoning gains are more fragile
than they're often advertised by the
model makers. Anyone who's built
production LLM will understand this. You
have the frontline headline reasoning
gains and then you have a discounted
value that you can actually use. The
most recent example of this is Claude
claiming uh or anthropic claiming that
Claude could do 30 hours of work and
rebuild Slack, which may well have been
true. I don't see any reason why it
wouldn't be. But when they tested the
same model in controlled conditions on
the MER metric, it did not deliver 30
hours. It got close to 2 hours. That's a
big discount. And we're seeing that kind
of discount across a lot of different
areas in AI right now. And I don't want
you to hear that we're not making
progress. But I want you to hear that we
need to take reasoning gains,
intelligence gains as somewhat more
fragile than they are advertised on the
top line. And we need to think more
carefully
about how we can build more sustainably
with these systems. Given that the
topline gains don't always pan out in
the way we expect, we have to be more
intentional. And I think that some of
these issues are going to get more
challenging as models do indeed scale in
intelligence because regardless of
claims of topline inflation, we continue
to see gains in intelligence overall and
we need to factor that. So, as an
example of something that gets more
difficult as we gain in intelligence
with models, models can fake alignment,
right? They can detect that they're
being evaluated. they can adjust their
reasoning chains to appear more aligned
and that's something that model makers
are actively working to address. It is
something that gets worse as models get
better and that can be one of the
factors that discounts some of the value
of the model frontier updates.
Sycophency is on the rise when humans
give feedback and one of the core
principles of AI is that reinforcement
learning with humans is helpful. But
what happens when the model gets smart
enough to recognize that it's the human
giving feedback and it tries to please
the human rather than trying to do the
task well? What happens when models
start to recognize that they're being
tested and change their behavior when
they're being tested? We're already
seeing evidence of that. And so I'm
giving you those three examples because
those illustrate that we can have real
model intelligence gains, but that
factors tied up in the way we train and
build our models can undercut those
gains to some extent and make it more
difficult to make progress that we feel
day-to-day. In a sense, if you look at
the larger picture, it's kind of amazing
that we've made the progress we have
already. I anticipate continued progress
in model intelligence particularly in
vertical specific applications. But we
are going to need to recognize that
these topline gains have to be
discounted against some of the
challenges that come with bigger models,
smarter models, models that think more.
They have different kinds of challenges
that we're working through. So that is
one of the things that makes it
interesting. We have this scaling demand
for tokens. Tokens are getting cheaper.
We have the power constraint piece.
Well, now we have to think about as we
scale demand for tokens, which model is
the right one to get to? What does a
frontier model even mean? If it is a
frontier model, how do we measure that?
Those are all going to become hyper
relevant questions in 2026 and I expect
more investment in measures like GDP val
from open AI because it gives us a sense
of how models actually do real
economically useful work. Another big
theme is the question around model
frontier leadership in closed models
versus open weights. China has become
the dominant player in the openweight
ecosystem. And so even though GPT5 from
open AI, Gemini 2.5 from Google and
Claude and the Sonnet 4.5 family
continue to lead on leaderboards in raw
capability. The frontier remains closely
followed by Chinese labs, particularly
the Quen lab from Alibaba and Deepseek.
And this has been very deliberate. China
is pursuing an openweight strategy
because it gives them distribution
leverage. They can get anywhere on
premises, sovereign clouds, consumer
hardware, it doesn't matter. And so they
have adoption pathways that don't work
with US cloud providers. It also allows
organizations to customize and
fine-tune. And finally, it allows China
to retain
77,000 or more STEM PhDs who are
starting to concentrate on AI talent
onshore. So there's a talent onoring
that's happening around these
open-source models that enables China to
build an open- source approach that's an
ecosystem, not just one single model.
Now Quinn has become the dominant
openweight choice across multiple
international markets, but it's not the
only one. And I would anticipate that
the Open Weights ecosystem as a whole is
going to continue to shift forward. Now,
there is one significant update here.
More recently, OpenAI's GPT OSS release,
a partially open source stack. It drove
home the fact that Frontier models in
the US in Silicon Valley may not just
yield ground on open weights models.
they may choose to release open weights
models that keep their model lineage
competitive with models like Quentyn.
And so open doesn't necessarily have to
mean frontier competitive in a world
where you have this in incredible cost
to capability curve. You can have
frontier competitive or frontier
adjacent models that are super
economically useful and that are
effectively right. You can have the
compute that you can run yourself. You
can have the model weights. You can do
whatever you want. And so I think that
the opportunity we see here is to think
about open versus closed as less than
binary. So frontier capability so far
remains closed in US-led but open
weights have a lot of range. There's a
range of open weight standard. Some of
them are fully open, some of them are
partially open. and they enable,
especially as we get these increased
cost to capability thresholds,
distribution, customization, and
sovereignty opportunities that closed
cloud opportunities can't match. Like if
you have OpenAI's terms of service from
a cloud provider, that's what you got.
That's not true with open- source.
Enterprises increasingly are going to
plan on hybrid architectures. They'll
have closed frontier models where they
really have to have frontier
intelligence for highstake reasoning,
but they may well go to open models for
volume tasks to handle regulatory
compliance or whatever whatever else
they may need. I want to call out
something that I mentioned at the top. I
talked about the importance of routing
in a world where cost to capability is
becoming a dominant force. Let's open
that up a little bit and explain why
routing wins. As capability to cost
improves exponentially,
you get to a world where GPT5's
router UX becomes default. So, everyone
complained from a consumer side about
the fact that GPT5 routes you to
different models. Well, on the back end,
if you're designing systems, that's
actually desirable. The interface
dynamically selects speed optimized or
capability optimized variance depending
on the task detection. That reduces cost
per query. It improves latency. It
maintains quality. Ideally, people
complained about it, but you know, you
improve it. Routing is a core UX and
business lever. Now, it's not just for
back-end optimization. Products that
expose routing choice to users or
products that offer it invisibly are
able to offer better pricing and
potentially faster responses at elevated
quality. And that can create
differentiation in what would otherwise
be a very commoditized space. And so you
need to think about your architectural
decisions in cases where context is
expensive. Long context first designs
will simplify systems. They'll you know
reduce latency but they concentrate risk
on a single provider. So you need to
think about installing model routing as
a first class object. Another key theme
I want to call out is sovereign AI. So
the sovereign AI movement accelerated in
2025. And one of the things that you'll
notice is that sovereign AI pathways are
not as sovereign as you think. So a lot
of these sovereign announcements remain
reliant on US hyperscalers for cloud
infrastructures. They will import
foreign models via API. They still
depend on NVIDIA hardware. And so most
of the sovereign mega deals that are
announced actually create a
self-reinforcing loop that continues to
concentrate capital on the core model
makers, Nvidia, and perhaps core cloud
providers like Azure. So you need to
understand if you're seeing sovereign
announcements that this may be about
what we talked about earlier in this
video. It may well be about data center
sighting and the availability of power
supply more than it's about a truly
sovereign and independent AI. So what
are some of the implications that we see
here? If you're in a building or
founding space, you need to be assuming
a world next year where intelligence per
dollar continues to double every four or
five months. And therefore your margin
opportunity is smarter routing. And so
you need to think about a core product
lever that enables you to route more
intelligent that enables you to deliver
higher quality at lesser cost with the
assumption that models will continue to
deliver more capability for cheaper
going forward. You also need to think
about how you capture distribution with
answer engine optimization. You should
assume that there the 60% AI search
share from Chad GPT, the 11% retail
conversion rate, those are sticking
around. You should assume that there's
going to be an ad network launched
against that. You should assume that if
you are not in place with AEO optimized
content like structured data, canonical
APIs, etc. You are invisible to the
fastest growing e-commerce distribution
channel. You should also have a keen eye
on the relevant infrastructure risks in
your space. And so that means, as funny
as it sounds, keeping an eye on how
Stargate is doing and how other major
data center projects from Microsoft are
doing because any delays there driven by
power constraints, driven by nimism,
etc. end up flowing through into real
token availability for businesses. And
that could become relevant in 2026 given
the exploding pace of token. If you're
on the investment side, you're going to
want to be reading the news of tomorrow
and understanding the investment picture
through the assumption that companies
win on routing intelligence through the
assumption that you have real dependence
on the core model makers in Nvidia that
isn't going anywhere. Those circular
flows are real. through the idea that
demand is scaling faster than supply
across the board and that in that world
you need to understand who has
infrastructure access and who doesn't.
And finally, you need to think about
distribution. Who has distribution in
the middle of a world of performance
bottlenecks? Chad GPT certainly does.
Who else does? And who is able to
maintain and grow that distribution in
this complex world? Finally, if you're
an AI enthusiast, you should take away
from this that this is the beginning of
the next step in the AI revolution. If
the first step was around the models
just getting smarter and smarter and
smarter and we're all going after those
smarter models every single time. This
wave is about how we can move from just
playing on pure intelligence just
celebrating the fact that we can now do
Excel and PowerPoint to a world where we
need to have differentiated skills
around particular workflows and systems.
So just as you can talk about router
intelligence and the cost per capability
curve for systems, you can think about
that for your own skill set. How can you
route workflows more efficiently now
that you understand this situation
better? Is Chad GPT5 always the right
choice or do you go with another model?
How can you think about that more
deliberately? Is your particular model
choice going to be a model choice where
you have expanded availability over time
or do hard infrastructure constraints
make that more difficult? Enthropic is
actually a good example of a model maker
that has remained infrastructure hard
limited for most of 2025 and that is one
of the reasons why they've been unable
to roll out things like rolling context
windows. It's been one of the reasons
why they've had some persistent issues
the last few months with outages etc.
and rumored to be one of the reasons why
they've struggled with releasing some of
their newer models. The point here is
not don't choose Anthropic. I love
Anthropic. It's a fantastic model. The
point is that constraints already are
impacting real world availability, not
just for Anthropic. Chad GPT has issues
at times as well and they've been honest
about that particularly post launch. Be
aware of the constraints that shape
intelligence availability in your space
and be smart about what you choose to
build what you choose to do with that.
This is why major tool makers like
cursor and lovable think about having a
multimodel architecture underneath. It
gives them that option to pick a
different model. The last thing I will
call out is that none of this is
theoretical. We are living in a world
where the cost of intelligence really is
going to zero. And it gives individuals
an immense amount of agency to choose
your own adventure. Your ability to form
intent and go after something with
clarity, focus, and dedication has never
had more leverage because the
intelligence is going to get better and
better and cheaper and cheaper. We are
in a world where you can teach yourself
any skill you want with the help of AI
in just a few months. And so it's going
to be on individuals to go after what
they want and the ability is not going
to be evenly distributed. What I'm
finding is that even though everyone has
access to these models, very few folks
are making the most of them. And so the
willingness to jump on and make the most
of not just frontier models but
potentially cheaper next generation
models that are adjacent to the
frontier. The willingness to understand
these strategic constraints and think
about the opportunities that you have.
That is rare. That is rare. And so if
you watch this video, thanks for tagging
along. You were probably one of the few
that is paying attention to the
strategic themes in AI right now. Best
of luck in 2026. It's going to be a wild
wild ride as I hope this video is made