AI’s Exponential Rise Defies Bubble Narrative
Key Points
- Humans consistently misjudge exponential growth, so we tend to dismiss rapid AI advances—just as we downplayed COVID’s spread—because day‑to‑day changes feel normal.
- Julian Schvviser (formerly of AlphaGo, Muse, now Anthropic) argues that internal data shows AI productivity could increase ten‑fold within 18 months, with Frontier Labs seeing no sign of a slowdown, making “bubble” claims essentially bogus.
- Expert forecasts often miss exponential curves (e.g., solar‑panel installations) because intuition is anchored to linear trends, leading credentialed skeptics to underestimate transformative technologies.
- There’s a substantial information gap: AI researchers inside companies observe fast, detailed progress, while outsiders see only surface‑level imperfections and assume development is lagging.
- Highlighting Julian’s podcast and essay is crucial, as it offers a rare view into these internal metrics and counters the prevailing narrative that AI’s growth is stagnant.
Sections
- Exponential Blindness and AI Bubble - The speaker contends that, as with COVID‑19, people misjudge AI’s trajectory by fixating on present imperfections rather than its exponential growth, citing Julian Schvviser’s claim of a 10× productivity surge within 18 months and declaring the “bubble” narrative misguided.
- Beyond Hype: AI Progress Metric - The speaker argues that, contrary to bubble fears, concrete evidence of AI advancement lies in the increasing number of work hours AI systems can reliably operate, a metric Julian identifies as economically meaningful and difficult to game.
- Beyond Leaderboards: Measuring Real AI Capability - The speaker criticizes optimizing for public benchmarks—citing Goodhart's law—and argues that evaluation should prioritize agents' ability to perform useful, real‑world work over leaderboard scores, while noting reinforcement learning from massive human text remains a viable growth path.
- Julian's AI Timeline & Deployment - Julian predicts AI will work full eight‑hour days by mid‑2026, match human experts by year‑end and surpass them by 2027, urging firms to deploy AI strategically on tasks where humans are weakest, as illustrated by solo founders leveraging AI to fill their gaps.
Full Transcript
# AI’s Exponential Rise Defies Bubble Narrative **Source:** [https://www.youtube.com/watch?v=SW1s22kJ15g](https://www.youtube.com/watch?v=SW1s22kJ15g) **Duration:** 00:20:14 ## Summary - Humans consistently misjudge exponential growth, so we tend to dismiss rapid AI advances—just as we downplayed COVID’s spread—because day‑to‑day changes feel normal. - Julian Schvviser (formerly of AlphaGo, Muse, now Anthropic) argues that internal data shows AI productivity could increase ten‑fold within 18 months, with Frontier Labs seeing no sign of a slowdown, making “bubble” claims essentially bogus. - Expert forecasts often miss exponential curves (e.g., solar‑panel installations) because intuition is anchored to linear trends, leading credentialed skeptics to underestimate transformative technologies. - There’s a substantial information gap: AI researchers inside companies observe fast, detailed progress, while outsiders see only surface‑level imperfections and assume development is lagging. - Highlighting Julian’s podcast and essay is crucial, as it offers a rare view into these internal metrics and counters the prevailing narrative that AI’s growth is stagnant. ## Sections - [00:00:00](https://www.youtube.com/watch?v=SW1s22kJ15g&t=0s) **Exponential Blindness and AI Bubble** - The speaker contends that, as with COVID‑19, people misjudge AI’s trajectory by fixating on present imperfections rather than its exponential growth, citing Julian Schvviser’s claim of a 10× productivity surge within 18 months and declaring the “bubble” narrative misguided. - [00:04:58](https://www.youtube.com/watch?v=SW1s22kJ15g&t=298s) **Beyond Hype: AI Progress Metric** - The speaker argues that, contrary to bubble fears, concrete evidence of AI advancement lies in the increasing number of work hours AI systems can reliably operate, a metric Julian identifies as economically meaningful and difficult to game. - [00:10:17](https://www.youtube.com/watch?v=SW1s22kJ15g&t=617s) **Beyond Leaderboards: Measuring Real AI Capability** - The speaker criticizes optimizing for public benchmarks—citing Goodhart's law—and argues that evaluation should prioritize agents' ability to perform useful, real‑world work over leaderboard scores, while noting reinforcement learning from massive human text remains a viable growth path. - [00:15:38](https://www.youtube.com/watch?v=SW1s22kJ15g&t=938s) **Julian's AI Timeline & Deployment** - Julian predicts AI will work full eight‑hour days by mid‑2026, match human experts by year‑end and surpass them by 2027, urging firms to deploy AI strategically on tasks where humans are weakest, as illustrated by solo founders leveraging AI to fill their gaps. ## Full Transcript
What if we're making the same mistake
today with AI that we made with COVID in
2020? Looking at today's imperfections
instead of exponential growth rates.
Julian Schvviser from Alph Go, Muse, and
now at Enthropic says the data shows we
have 18 months until 10x productivity
and the Frontier Labs are not seeing any
slowdown whatsoever. In other words, the
bubble is fake news. Let's get into what
he said in his hourlong podcast and even
better essay that he wrote last month.
I'm going to give it to you very quickly
in just a few minutes and I'm going to
give you my take as well. Let's jump
right in. Okay. First, why is the bubble
narrative backwards? Leon's argument is
that humans are very bad at
understanding exponential growth. When
something doubles regularly, humans
consistently fail to grasp what's coming
because today does not feel any
different from yesterday. And so, bubble
skeptics will see AI continuing to make
mistakes on tasks and will conclude it
will never work. or more recently, we'll
see multi-hour work from AI agents
inconceivable at the beginning of 2025
and say, "We're not making fast enough
progress." This is the same cognitive
error, Julian argues, that led people to
dismiss COVID as just a flu, even as
case counts were doubling every week. We
humans tend to focus on extending the
patterns we see and are very bad at
seeing exponentials. And this includes
smart people, right? During any time of
rapid transformation, simple math will
often beat expert intuition. The joke is
the straight line on the graph will beat
expert intuition because expert tends to
anchor on how fast things have changed
in the past. A great example of this,
one of my personal favorites is the
solar rate of install. So if you look at
the graph of the solar rate of install,
it is an exponential curve. It goes
straight up. But if you look at the
graph of projections for solar, it is
missing every time because experts just
cannot get an exponential correct. This
leads to widespread skepticism from
credentialed researchers whenever we're
in an exponential situation because they
struggle to understand that we really
are seeing extremely rapid progress and
all of their expertise is calibrated to
a different curve, a more normal curve,
a more gradual rate of adoption. that
Julian argues that's just not what we're
seeing with AI. I would agree. The last
piece I think is the most important for
his thesis. Julian agrees that on the
outside it may look like slower
progress, but from his position inside
Anthropic and as a longtime frontier AI
researcher, it does not look like that.
And so there's this tremendous
information gap between what's visible
from the inside and what's visible from
the outside. And that's why I wanted to
do this summary for you because I don't
think that Julian's interview, Julian's
post is getting enough attention. We
need to understand how AI companies see
internal measurements on the inside with
a lot more detail. So this can create a
strange situation, right? People
building these systems will talk about
explosive growth in metrics that matter,
but outsiders are still debating is this
true, when will this be deployed, etc.
And so one example of this, Julian
mentions it. I think it's also very
important is how long AI agents are able
to work without supervision. You'll
remember when Anthropic, the company
Julian works at, launched Sonnet 4.5,
they launched it with a claim that it
built Slack in 30 hours, right? I talked
about that. People tend to sort of wave
their hands and say, well, what did the
official sort of test results say? Is it
extending the horizon, etc. And I think
the point is really you continue to see
anecdotes like that that are very very
long and multi-day and even a few months
ago you weren't seeing that. And so the
point isn't any given model specific
measure on a given test. The point is
whether the AI tide is coming in, how
rapidly it's coming in. And Julian's
argument is the tide is coming in really
fast. It's silly to argue about whether
we've hit a wall. We keep seeing these
stories from the inside in his
perspective and from the outside if
we're looking of better and better and
better and better performance
particularly on things that matter a lot
like whether AI can operate autonomously
and that is the root of his perspective
that internal internal researchers are
seeing things in ways that external you
know news anchors external journalists
just don't see. I think that's a really
important take. Um, and that really like
if we talk about why the AI bubble is
backwards, I think that's the root of
it. If you are in San Francisco, if you
visit San Francisco, if you talk to
people who are on the cutting edge of AI
research, none of them are saying that
we are hitting a wall. And it's not just
the people who have gigantic piles of
equity like Sam Alman who have that
position, who would have the incentive,
right? It is people who are just
researchers like Julian who are just
trying to understand how to make systems
that work better. They're also saying
we're not hitting a wall. And it's very
odd for someone like me to be in a
position where I talk a lot to people
who are on the outside and who don't
understand that and who ask me
frequently, is there an AI bubble, Nate?
I got to say, I see the same thing
Julian sees. I don't see evidence of a
bubble. And I'm really glad he did this
podcast. I'm glad he did this blog post
because I think he articulates it really
well. Let's dive in though to what isn't
hype. What is the evidence that suggests
that we are not in a bubble and that AI
continues to progress? Because I think
that it often gets lost in arguments
over how much we're spending on data
centers or how much we're spending as a
percentage of capital expenditure at big
companies etc. how much electricity
we'll need. Let's leave that to the side
for now and let's look at evidence of
progress in AI models. Julian's argument
here is that what I referred to
previously, how long AI can get work
done, is actually the core metric for
measuring whether we're making
meaningful progress. I think that's
really interesting for two reasons. One,
it means that maybe the major model
makers are converging on a metric that
is not easily gamed. And that would be
great news because we've had a lot of
metrics where you get up to 90 95% on
this test or that test and everyone sort
of rolls their eyes. Sweet bench is an
example of that. People are arguing over
ARC AGI all the time and how we can make
that harder so you can't game it as
easily, math test, etc., etc. So the
breakthrough insight is really that
there is a correlation between what
matters economically and the number of
hours AI can work. If AI can work
longer, Julian argues, then we will get
more value from AI. And so it's not just
whether you can delegate work, it's
whether you can delegate work and
effectively get an answer not just for
quick responses but for long-term work.
MER, an organization that Julian cites
that is the bar for measuring this has
tracked that we have gone from handling
15minute tasks to 2-hour tasks in just 7
months. That's how fast things are
moving. And yes, is the 30-hour task
that I described about rebuilding Slack,
is that an outlier when it is compared
to two hours? It absolutely is. But the
overall tide is coming in and we've
moved from 15 minutes to 2 hours very
quickly. And so Julian's argument here
is essentially the number of hours AI
can work autonomously is so tightly
correlated to economically useful work
that we should view them as the same
metric and we should not use other
metrics. Now, he doesn't go as so far as
to say we shouldn't use other metrics.
I'm going so far as to say I don't think
a lot of other metrics are useful. And I
think that he's correct here. What
separates real progress from bubble hype
is that we can prove with an independent
organization like MBTR that that
duration has doubled every 7 months for
a while and we are on a 7-month doubling
track going forward. It's not just that
this is after the fact spin. It's it's a
forecast that we have made since the
beginning of the year. I remember
talking with researchers. I've remember
talking with others in the AI space who
were saying back in January, we are
seeing a doubling curve every six or
seven months for AI on autonomous tasks
and we expect it to continue through
2025 into 2026. That is a falsifiable
prediction which is a big deal because
you can say well it didn't come true. In
this case they made the call it has come
true. We are seeing exactly what they
predicted that the ability of autonomous
agents to do work is extending and
extending and extending. And by the way,
this is not just about claude. I know
Julian works at Anthropic, but Codeex
also is capable of very longunning
workloads. I saw a story on X where
Codex was asked by a researcher to do a
60-hour task and was able to do it
autonomously for 60 hours. Again, just
one anecdote. I'm not saying it's going
to score 60 hours on the METR test,
which technically measures the human
equivalent amount of time, not just the
AI amount of time, but the point is sort
of the tide is coming in, right? We see
longer and longer autonomous tasks being
something that we can do. Now, a skeptic
would argue, look, Nate, you just talked
about SW Labs. You talked about how you
can game tests. Maybe AI labs have
optimized for this test. But OpenAI just
released a completely different
evaluation called GDP val which is 1,300
plus real work tasks across 44 different
professions and they were graded by very
experienced professionals not by open AI
and they could not tell if they were
rating human or AI work. It was a double
blind test and the result was the same
pattern of exponential improvement on an
entirely madeup exam that did not exist
when the models were trained. In other
words, two independent measurement
systems, one created entirely after the
models it was measured and it showed the
same doubling pattern. You're measuring
something real. And that's sort of
Julian's argument. Julian actually
specifically praised OpenAI because they
published GDP val even though Anthropics
model is the one that did the best,
right? And Julian rightly said that's a
sign of integrity from the OpenAI team
there. Right? They they're not trying to
trumpet their own model and say it did
the best. They're saying in this case
Opus 4.1 did the best on that test. I'm
sure, you know, wait a month and we'll
see. Now, you might think, having talked
about benchmarks a lot, that what I'm
saying is that we should trust
benchmarks, but I've said pretty clearly
benchmarks are not the thing to pay a
lot of attention to. And so, what what's
the solution there? Well, the answer is
you want to pick benchmarks that are not
easy to game. And that's why I've called
out GDP valr.
And Julian calls both of those out as
well. It is absolutely possible to
optimize for public leaderboards on non
real tests on tests that are easy to
gain and that is what we see in the real
world performance on GDP val of Gro 4
and Gemini 2.5 Pro both of which have
topped many public leaderboards and both
of which perform poorly on GDP val's
real world tasks and this it's an
example of goodart's law right when you
optimize for a metric you gain the
metric instead of building the
capability. And so what Julian is trying
to argue for is that we want to build
meaningful capabilities. We want to
measure them in a meaningful way and we
want to show they do real work. So if we
step back here and we look at the whole
measurement conversation, I think where
this leads to is we need to be having
much simpler conversations about
measurement. We need to be talking about
whether agents can do useful work much
more than we do. And we need to be
talking much less about whether a model
has scored a one or a two or a three or
four on some public leaderboard
somewhere because it doesn't matter. And
I'm really grateful for Julian to Julian
for saying it because it it's really
true. Overall across the technical side
of things, Julian also doesn't see a
wall. And he talks about it in a fair
bit of detail. He talks about the idea
that you can pursue reinforcement
learning and continue to grow by looking
at massive amounts of human written
text. And that's true. And that's really
important for forming good models. And
one of the things that he calls out is
that this sort of body of excellent
human text like scientific papers or
highquality books is something that
enables anthropic in particular to
pursue pre-training that gives both
efficiency and safety benefits which is
a fancy way of saying if you start with
a strong clean corpus of human knowledge
you actually don't necessarily have the
sort of contamination issues you have if
you like throw red in there right which
is something that is actually largely
gone from a lot of the frontier models.
Now people don't know that but Reddit is
mostly been purged out because it didn't
end up being a high source of a high
quality source of data. It's not
completely gone but it's like 1 2% of
the of the total. So another technical
point that he made that I think is
really significant
is he went back to his own experience
when he was uh building machine learning
systems in 2016. He was part of the Alph
Go project which built at Google an AI
system that could play go which is a
game that is harder to play than chess.
And there's an infamous story, if you're
familiar with the history of AI, called
move 37. And Julian talks about it
because it's important for us today. So,
at the time, Alph Go was trying to learn
to beat the best human players. And it
had not done so. And then at one point
in a real life game with a real life Go
Master, it played a 37th move that was
considered a mistake by all of the
commentators. It violated basic strategy
and everyone thought it was a disaster.
But later as the game progressed, the
masters realized that Alph Go understood
the game on a level they didn't. And it
was something that led to the win, the
the game being won. And the masters then
realized that actually the move 37 was
brilliant. And so Julian talks about
this and this idea of a move 37. And
what he suggests is that as we start to
extend the length of time agents can do
work, the amount of useful work they can
do, we are going to get to a point where
there's a move 37 moment uh for AI. And
we don't know what that will be going in
where we inherently can't. But it is
something
that we need to be prepared for. We need
to expect something like maybe a Nobel
Prize scientific breakthrough in 2027,
2028 somewhere in there where AI can
search a solution space faster and more
effectively than human intuition and we
get something that would not have been
possible at all that is world changing
because of AI. The last technical piece
I want to talk about is this idea of an
implicit world model. And Julian talks
about that a fair bit. One of the things
that I think is really significant is
modern LLMs do something similar to like
the alpho system that I described, they
predict consequences well enough to plan
multi-step solution. And so one of the
things that Julian talks about in the
podcast is that if you can plan
multi-step solutions effectively enough,
as early AI did with Alph Go, planning
moves as uh chess models did, this was
back in the day by the 1990s, uh Deep
Blue, right? The suggestion is if LLMs
are doing the same thing. If they're
able to plan multi-step solutions
autonomously as agents and conduct work
for many hours, at that point, it's not
really just next token prediction. It's
using predictions to search possible
action sequences and then to plan
strings of lines. I do think gaming as a
metaphor helps here because you can
actually see the strings of moves ahead
in Alph Go or in chess. And it's a
similar thing, right? the go board game,
the chess board game all depend on the
right moves in the right sequence and
understanding those sequences helps you
unlock strategy. And that's what we
learned as we build AI systems that can
play those games. Well, in a similar
way, we are now learning that LLMs get
good enough at next token prediction
that they can predict possible action
sequences as a whole. And that's a
significant shift that we are starting
to see that I think is going to really
come out in 2026.
Okay,
let's look at what the timeline looks
like here. So, one of the things that
Julian is big on when he talks about
falsifiable claims, he wants to make
claims that are falsifiable and put his
own skin on the line. I'm just going to
say them back and let you sort of think
about them. But he's he's thinking AI
will be working full 8 hour days without
human intervention by mid 2026 and will
be matching human expert performance
across many industries by the end of
this next year in 2026 and routinely
exceeding human experts by the end of
2027. And so what he's saying is in the
next 12 to 30 months we can prove him
wrong or we can prove him right. And
really all he's doing is drawing those
straight lines on a graph. And so if we
ladder up, what does this mean? Number
one, it means that we need to think very
carefully
about where we want to deploy AI
systems. So I I want to suggest that you
should think about this as where are you
strong on task versus where are you weak
on task because AI is going to be at a
point very soon where AI can pick up the
things you are weak at. I already see
this with solo founders. they're able to
have AI pick up their weak spots in ways
that solo founders have traditionally
just had to suffer through. And I think
that's going to be true for other
workers as well. Another example that I
think is important is that we need to
start deciding how we want to talk and
think about the idea that X more work is
capable of being done because it's not
necessarily clear that the 10x more work
is only going to be valuable to a few
people. We can choose to be 10x more
productive for ourselves. We can choose
to jump in where we want to and excel at
our skill sets in ways that were not
possible because we have AI assistance.
And so I think that one of the things
that is hard to do is to talk about what
it means as a society. But one of the
things that's easy to do that you and I
can do right now is talk about what it
means for us. Right? If you're a
business owner, what does it mean to
think about how you empower your
employees and help them to do 10x more?
If you are a employee, what does it mean
to think about the ability to spread out
your wings and sort of have a mech suit
and do a whole lot more than you could
before and have confidence in that
because at roots we go back to the idea
of you do your strengths, you're going
to have expertise that is really, really
deep in a particular area. And it's
Julian argues this and I agree. It is
the human AI collaboration around that
deep AI expertise that unlocks value.
And so I don't see a world of wholesale
replacement. And I think it's
interesting that this researcher at
Anthropic is not arguing for it either.
He sees a world of human AI
collaboration as well, but we need to
get ready for that, right? We need to
think about what that means. I say this
all the time, but the preparation window
for all of this is closing fast, right?
If you can draw flat lines on a chart
and you're going up, like, we have to
get ready for this now. And that's one
of the things I talk about all the time
on this podcast. You have to get ready
now. Like there's not another time. like
it will not get easier if you wait 6
months. If I can leave you with one
thing about this whole AI bubble
conversation, keep in mind that every
cloud provider out there, Microsoft,
Amazon, Google, all of them are doing
everything they can to get GPUs in the
door for demand for AI. The demand is
there. If the demand is that high from
businesses, it is hard to argue that we
are in a bubble. In fact, on October the
28th when Amazon laid off 30,000
workers, there is a direct line from
that, not to AI and automation. People
are going to make this claim that it's
all about AI and automating your work
away. No, it's a direct line to the
freeing up of cash to buy more GPUs for
more cloud compute because the demand
for AI from other companies is so great
that the capital expendit expenditure
ratio at these big cloud companies is
getting out of control. And so they're
trying to bring down their fixed costs
and that means salaries. So it's not
that AI is automating away roles. The
existing people are more stressed. It's
that you need to free up cash in order
to have money to buy GPUs to finance the
demand for AI from businesses. That's
not a bubble. It is a real hard time for
the 30,000 people that were cut at
Amazon. And there may be other stories
like that as well. But I want to get the
narrative really clear. That is not an
AI automation story. That is a story
about securing cloud compute to deal
with surging demand. People are just
making financial decisions to reduce
fixed costs for the public markets in
the next quarterly report. And that's
just what companies do. So summing all
of this up, Julian doesn't think we're
in a bubble. I don't think we're in a
bubble either.