Karpathy vs McKinsey: AI Design War
Key Points
- A emerging conflict in AI pits business consultants, exemplified by McKinsey’s boardroom influence, against technical builders like Andrej Karpathy, highlighting divergent strategic visions.
- Karpathy’s “Software 3.0” talk at Y Combinator frames large language models (LLMs) as computers, utilities, and operating systems, arguing that the next programming language will be English.
- He introduces the term “people spirits” to describe LLMs as stochastic simulations of humans, emphasizing that designing software for these entities requires fundamentally new, human‑centric approaches.
- Karpathy cautions against hype about autonomous AI agents, stressing that realistic progress over the next 1‑2 years will depend on substantial human supervision, a stance he presents as more realistic than many industry voices.
Sections
- AI Consultant vs Builder Clash - The speaker contrasts business‑focused AI consultants like McKenzie with hands‑on builders such as Andre Karpathy, using Karpathy’s “Software 3.0” presentation to argue that his developer‑centric vision will likely dominate the AI landscape.
- Constraining AI Output for Practical Validation - The speaker debates Andre’s proposal to limit LLM‑generated content—favoring a streamlined validation loop over overwhelming reviewers—while also contesting the notion that English will replace all programming languages, emphasizing the continued need for skilled engineers to manage increasingly complex hybrid systems.
- Skepticism Over Agentic Mesh Claims - The speaker critiques the hype around a supposedly plug‑and‑play “agentic mesh,” noting its lack of empirical grounding, reliance on outdated models, and the broader disappointment with edge‑computing promises such as Apple’s recent bet.
- Pressuring Firms on AI Accountability - The speaker thanks Andre, urges organizations such as McKinsey to adopt a stronger stance on AI, acknowledges they may ignore the request, yet still hopes for a better response to AI challenges.
Full Transcript
# Karpathy vs McKinsey: AI Design War **Source:** [https://www.youtube.com/watch?v=xZX4KHrqwhM](https://www.youtube.com/watch?v=xZX4KHrqwhM) **Duration:** 00:11:48 ## Summary - A emerging conflict in AI pits business consultants, exemplified by McKinsey’s boardroom influence, against technical builders like Andrej Karpathy, highlighting divergent strategic visions. - Karpathy’s “Software 3.0” talk at Y Combinator frames large language models (LLMs) as computers, utilities, and operating systems, arguing that the next programming language will be English. - He introduces the term “people spirits” to describe LLMs as stochastic simulations of humans, emphasizing that designing software for these entities requires fundamentally new, human‑centric approaches. - Karpathy cautions against hype about autonomous AI agents, stressing that realistic progress over the next 1‑2 years will depend on substantial human supervision, a stance he presents as more realistic than many industry voices. ## Sections - [00:00:00](https://www.youtube.com/watch?v=xZX4KHrqwhM&t=0s) **AI Consultant vs Builder Clash** - The speaker contrasts business‑focused AI consultants like McKenzie with hands‑on builders such as Andre Karpathy, using Karpathy’s “Software 3.0” presentation to argue that his developer‑centric vision will likely dominate the AI landscape. - [00:03:32](https://www.youtube.com/watch?v=xZX4KHrqwhM&t=212s) **Constraining AI Output for Practical Validation** - The speaker debates Andre’s proposal to limit LLM‑generated content—favoring a streamlined validation loop over overwhelming reviewers—while also contesting the notion that English will replace all programming languages, emphasizing the continued need for skilled engineers to manage increasingly complex hybrid systems. - [00:06:38](https://www.youtube.com/watch?v=xZX4KHrqwhM&t=398s) **Skepticism Over Agentic Mesh Claims** - The speaker critiques the hype around a supposedly plug‑and‑play “agentic mesh,” noting its lack of empirical grounding, reliance on outdated models, and the broader disappointment with edge‑computing promises such as Apple’s recent bet. - [00:11:30](https://www.youtube.com/watch?v=xZX4KHrqwhM&t=690s) **Pressuring Firms on AI Accountability** - The speaker thanks Andre, urges organizations such as McKinsey to adopt a stronger stance on AI, acknowledges they may ignore the request, yet still hopes for a better response to AI challenges. ## Full Transcript
There's a war at the heart of AI between
the business consultants and the
builders. And I want to outline how that
popped out in sharp relief this week
between Andre Karpathy and McKenzie.
Both of them had major presentations uh
this week, major papers this week. I
want to talk about how stark a contrast
they laid out and why Andre's vision is
more likely to be correct. It's
important to understand both though
because McKenzie has tremendous
influence in the boardroom. Okay, first
understand the context for Carpathy's
presentation. He's speaking to a bunch
of entrepreneurs at Y Combinator Startup
School. His presentation is titled
software 3.0, which he is sort of
uniquely qualified to talk about because
he coined the term software 2.0 uh a few
years ago, I believe, also at YC. So,
he's coming back and he's basically
saying there's a new paradigm. It's
shaped obviously by AI. and he spends a
lot of time in the presentation which
I'm going to link encouraging you to
think about AI as a design problem that
is unique because of the qualities of
the large language model. And so he
talks about large language models as
computers, large language models as
utilities, large language models as
operating systems. And he describes in
detail how LLMs have qualities that
match these. So as an example for
utilities we meter their usage dollars
per token the way we meter electricity
right uh for oss we've already heard
other major figures in AI talk about the
fact that especially young people are
using AI like an operating system and
you have differences in preference for
operating system the Windows versus Mac
wars well similarly you have differences
in preference for clawed versus open AAI
so you have some of that same sort of
dichotomy playing out but let's get to
the heart of software 3.0 0 software 3.0
is the idea that the next coding
language is English and that we are not
working with deterministic software.
Instead, we are working with what
Carpathy terms people spirits. So,
stochastic simulations of people is the
way he puts an LLM. I love that phrase.
I'm going to keep and like share it a
lot because it helps me explain why
large language models feel so human but
aren't. It explains why the intelligence
of large language models feel so jagged.
They are stochastic simulations of
people. They're people spirits. And so
if we're building software for this kind
of interaction for people spirits, we
have to think from the ground up how we
design our software. And this is where
Andre's caution comes in. And I think
it's really needed in an age when we are
hyping up agents so much. It is really
really important to think about our
building in the next six months, 18
months, two years as building for people
spirits that need a fair bit of human
people supervision to go anywhere. And
Andre is more honest about this than
most of the other major figures in AI
that I've seen. He is not overhyping and
saying that AI agents will take over
everything and be autonomous. And this
is where you see an early conflict with
McKenzie because what Andre is saying is
essentially people's spirits or LLMs
just don't have the uh reliable
execution. They have too much jaggedness
in their intelligence to be good at
enough of everything to be trusted with
highle tasks at this point. Instead, we
should be building our software for the
assumption that humans will need to be
validators in the loop that AI can
generate and human needs to validate.
And we need to think about software as a
design problem from that perspective.
And he suggests there's two ways to make
this easy. One is pretty obvious. Make
the the checking responsible validation
loop as easy as you possibly can. That's
software 101. But the second is a little
bit more controversial. Andre suggests
putting the LLM on a short leash,
deliberately constraining AI generation
so that you don't have so much AI
generation that you overwhelm
evaluators. An example of this would be
the AI generating hundreds of different
ad variants, but the human only being
able to validate 10 of them. Well,
what's the point? You're just wasting
energy at that point. And I appreciate
his honesty on that front. Now, I do not
think that Andre is entirely correct, or
at least I disagree with him, that
English will be the only programming
language of the future effectively. I
think in particular there will be a need
for strong technical engineers who
understand the construction of complex
systems because systems are about to
become more complex as we have
traditional software interacting with
this agentic augmented software that
he's talking about. This is not going to
be as clean as English driving code all
the way through. But I do understand
from his perspective as someone who's
dipped in engineering from the beginning
and like knows his code backward and
forward the transition to English is a
fundamental shift and he is to his
credit honest about the limitations of
the vibe coding revolution that he
kicked off a few months ago. He was the
one that said vibe coding and spawned a
thousand startups. And so he talks about
the fact really honestly that vibe
coding right now is great for local
environments, but there's a lot of other
pieces in the deploy pipeline in CI/CD
and integrations that don't work well
with vibe coding right now. And I
appreciated that honesty as well. So
when you ladder all of that up, what he
is basically saying, Andre is leaving us
with this vision of software 3.0 is
building like augmented iron man suits
for ourselves where the agents expand
our our span, our reach, our control. uh
but we have to design our data systems
to accommodate how they interact with
data. We have to design our software so
it's agent-friendly. We have to think
about agent control systems so that you
can have agents interacting with data
and people validating it in a
sustainable loop. It's a really
interesting software design talk and
it's scalable and it's empirical and
because he is a builder you can feel the
fingertip knowledge and that is the
fundamental distinction between Andre's
presentation of software 3.0 and
McKenzie's presentation which is very
very different. McKenzie is speaking to
CEOs. McKenzie and look, I I get that
Mistl blessed the McKenzie presentation.
Uh it's all about agentic mesh. That's
the theme. And like the CEO of Mistl has
a nice introduction at the beginning.
This is not an attack on Mistl. They do
hard work. They produce great software.
But McKenzie because of the way they
speak to their audience is not able to
successfully articulate anything that's
buildable for tech teams. And that is
the fundamental issue. I understand that
they want to communicate to CEOs in
their presentations, and I'll link to
this as well, that it is important to
think in terms of workflows. That's
true. It is important not to just think
in terms of LLM's automating tasks.
That's true. If you think about agents,
you have to think about autonomy. That's
true. The problem lies when they go from
general concepts to try and suggest a
solution. The agentic mesh is a word
salad that has no empirical grounding.
it doesn't have the builder's touch. And
that is what makes that presentation so
concerning because I've seen over and
over again as someone in sort of the
product engineering side of things when
you have a CEO come in fresh off a
report like that and he's like this
should just work. The McKenzie guys say
that they can build an agentic mesh and
you can plug any model in without
additional work. Why don't we use uh you
know Mistl small or why don't we use
GPT3.5 turbo because McKenzie mentioned
it. Both of those are in the
presentation, by the way. And the tech
teams roll their eyes because they're
like, "These are ancient models. They're
tiny." It relies on this assumption of
edge computing that hasn't sustained
very well because larger models just
show sustained gains in intelligence
that smaller models aren't matching.
That's one of the big surprises of 2025
is that edge computing for models is not
working as well as people thought it
would yet. Um, and to his credit, Andre
still thinks there's room for edge
computing. We will see. Apple made a big
bet on it earlier last year, and it
really hasn't paid off.
uh it remains to be seen. I don't want
to sort of rabbit hole us on edge
computing. That's probably a different
conversation. The point sort of for
McKenzie is that they should be able to
recommend something that is actually
buildable. And if you recommend what is
effectively a theoretical substrate for
agents that allows them to plug in like
USB ports and any agent can plug in and
you can plug in any data, that is a
fiction for a CEO. That makes a CEO
sleep well at night. It is not true. It
is not how you actually build things. I
understand because I've had to work with
boards that you do have to simplify
technical concepts into a business
narrative. I understand that. I
understand that you have to have
outcomes that you can talk about that
are easy for non-technical people to
understand. It is possible to take Andre
Carpathy software 3.0 or a similarly
clean technical vision and tell good
business stories. you do not have to
resort to the kind of um sophistry, the
kind of word salad that McKenzie uses in
order to communicate clear business
narrative. And in fact, the fact that
the fact that they're doing that, right,
the fact that they are telling a story
that isn't real at root because you
can't just plug agents in like USB, like
you can't just plug them in without
modification from any from any source
whatsoever and stick them into data and
just expect it all to magically work. It
does not work that way. And if you sell
that vision, what you are selling is the
reason why so many enterprise companies
are walking away from AI after an
investment and why so many enterprise AI
projects don't launch. It's because of
advice like this. And so part of why I
am punching up on McKenzie a little bit
is I I need people who have seuite and
board ears to tell the truth about
building AI to tell the truth about how
complex AI systems are. That yes, there
is a power law of payoffs. If you invest
and you get true AI in Agentic systems
and you can implement them at the
enterprise level there there is big
money on the table. It matters but it's
hard to get there and if you are just
starting out that may not be the place
you want to start out. You don't want to
necessarily start out with automating
your entire customer success line or
automating all of your
oh what have you. Automating all of your
uh retail uh orders and pickups. You get
the idea. What you want to do is focus
on a crawl, walk, run motion. Describe
the culture change you want and start
living into that. And that's the piece I
want to leave you with today. What is
the culture change that Andre is
suggesting we need to create in our
organizations that enables us to think
in terms of software 3.0 that enables us
to think and relate to LLMs not as
people, not as programs, but as
stochastic simulations of people in a
probabilistic context. There's an
emergent psychology to LLM that is
relevant to talk about even if the
psychology isn't quote real because
these are quote simulations. We can
still talk about it and understand it
and that may be a window for us to
understand how probabilistic agents
interact with our software
infrastructure. There's a lot to dig
into, but I would much rather us dig
into what is actually going on and tell
business stories that actually matter
than go to McKenzie's side of the fence
to pretend everything is easy and get
into a position where enterprise after
enterprise starts on AI and walks away
because they discover belatedly that
it's much harder than the board deck
says. It's just not true that you can
plug in agents anytime. It's just not
true that these tiny little edge models
will do whatever you want and won't get
eaten by the next large model that comes
along. We need to do a better job
telling truths up and down the stack.
And I appreciate Andre for doing his
best to lay that out. And I'm asking
organizations like McKenzie to take a
stronger stance there. And look, I who
am I kidding myself? They're not going
to hear me. They're not going to listen.
That's okay. I can still ask. I can
still expect a better response to the
challenge of AI. Cheers.