Claude vs Codex: Agent Showdown
Key Points
- Claude and Codex are two leading command‑line AI agents that embody contrasting strategies for how future agents should work, making them a useful benchmark for choosing the right tool for a given task.
- Claude originated as an internal, general‑purpose assistant at Anthropic (initially released as “Claude code”), used not just for programming but across marketing, legal, and other departments, reflecting Anthropic’s vision of agents as flexible “tool‑loop” helpers that can call external tools (e.g., Python libraries, Excel) on demand.
- Codex, by contrast, follows a more specialized, code‑centric approach, positioning the agent primarily as a developer‑focused companion rather than a universal workflow assistant.
- The discussion extends beyond the terminal to show how these divergent philosophies are shaping the broader ecosystem of non‑command‑line agents, influencing the “most important AI question” of the mid‑2020s—whether agents should be narrow, task‑specific aides or adaptable, tool‑integrating generalists.
Sections
- Claude vs Codex: Agent Showdown - The speaker compares the command‑line AI agents Claude and Codex, explains their differing development philosophies and ideal tasks, and extrapolates how these contrasting visions shape the evolution of both terminal‑based and non‑terminal AI tools.
- Agents Calling Tools via MCP - The speaker explains how Claude agents use MCP‑enabled tool calls—such as Python libraries, web search, or Figma—to autonomously perform tasks, collaborate with users, and even nest sub‑agents for scalable assistance.
- Task‑Oriented Agent Builder Paradigm - The speaker contrasts ChatGPT’s perpetual, loop‑like assistant style with a new line‑based, structured‑context approach that frames interactions as discrete, end‑to‑end tasks suited for building scalable enterprise agents.
- Token‑Efficient Specialized Coding Model - The speaker explains that CodeX is a purpose‑built Codex variant that minimizes token usage for simple tasks while also acting as a powerful, autonomous coding agent capable of tackling complex, high‑impact problems in large enterprise codebases.
- Linear vs Collaborative AI Agents - The excerpt contrasts task‑focused, single‑run agents such as N8N/NAND and Lindy.ai—which complete a job and stop—with more interactive, always‑on agents and AI companions that aim for continuous, collaborative engagement beyond discrete work tasks.
Full Transcript
# Claude vs Codex: Agent Showdown **Source:** [https://www.youtube.com/watch?v=EDcWcPueRSE](https://www.youtube.com/watch?v=EDcWcPueRSE) **Duration:** 00:17:07 ## Summary - Claude and Codex are two leading command‑line AI agents that embody contrasting strategies for how future agents should work, making them a useful benchmark for choosing the right tool for a given task. - Claude originated as an internal, general‑purpose assistant at Anthropic (initially released as “Claude code”), used not just for programming but across marketing, legal, and other departments, reflecting Anthropic’s vision of agents as flexible “tool‑loop” helpers that can call external tools (e.g., Python libraries, Excel) on demand. - Codex, by contrast, follows a more specialized, code‑centric approach, positioning the agent primarily as a developer‑focused companion rather than a universal workflow assistant. - The discussion extends beyond the terminal to show how these divergent philosophies are shaping the broader ecosystem of non‑command‑line agents, influencing the “most important AI question” of the mid‑2020s—whether agents should be narrow, task‑specific aides or adaptable, tool‑integrating generalists. ## Sections - [00:00:00](https://www.youtube.com/watch?v=EDcWcPueRSE&t=0s) **Claude vs Codex: Agent Showdown** - The speaker compares the command‑line AI agents Claude and Codex, explains their differing development philosophies and ideal tasks, and extrapolates how these contrasting visions shape the evolution of both terminal‑based and non‑terminal AI tools. - [00:03:12](https://www.youtube.com/watch?v=EDcWcPueRSE&t=192s) **Agents Calling Tools via MCP** - The speaker explains how Claude agents use MCP‑enabled tool calls—such as Python libraries, web search, or Figma—to autonomously perform tasks, collaborate with users, and even nest sub‑agents for scalable assistance. - [00:06:51](https://www.youtube.com/watch?v=EDcWcPueRSE&t=411s) **Task‑Oriented Agent Builder Paradigm** - The speaker contrasts ChatGPT’s perpetual, loop‑like assistant style with a new line‑based, structured‑context approach that frames interactions as discrete, end‑to‑end tasks suited for building scalable enterprise agents. - [00:09:57](https://www.youtube.com/watch?v=EDcWcPueRSE&t=597s) **Token‑Efficient Specialized Coding Model** - The speaker explains that CodeX is a purpose‑built Codex variant that minimizes token usage for simple tasks while also acting as a powerful, autonomous coding agent capable of tackling complex, high‑impact problems in large enterprise codebases. - [00:13:16](https://www.youtube.com/watch?v=EDcWcPueRSE&t=796s) **Linear vs Collaborative AI Agents** - The excerpt contrasts task‑focused, single‑run agents such as N8N/NAND and Lindy.ai—which complete a job and stop—with more interactive, always‑on agents and AI companions that aim for continuous, collaborative engagement beyond discrete work tasks. ## Full Transcript
Claude versus Codex. Who wins? If you're
not familiar with it, Claude and Codex
are the premier agents from two of the
major model makers. They are in the
command line, so it's somewhat scary to
people, but they illustrate two
competing visions of where agents are
going. And I've seen them both, and I've
played with them both, and I have a
sense of what is actually unfolding in
terms of strategy. And I want to lay out
how different they are, give you a sense
of which one you might want to pick for
particular tasks. And then in addition,
if you're not a command line person or a
terminal person, you're like, "Oh my
god, that's scary." Fine. We're going to
talk about how those visions actually
play out into differing perspectives on
non-command line agents beyond Claude,
beyond Codeex, what other tools are out
there and how are they evolving in line
with these visions? Because as you'll
see, these are fundamentally different
approaches to the most important
artificial intelligence question of 2026
and 2027 and actually 2025. Here, here
we are. So, let's start with Claude.
Claude was the first one that came on
the scene. Claude's approach really
evolved out of the roots of the Claude
code product. It was built as an
internal enabler, an internal tool at
claude, at anthropic. And the goal was
simply to give internal Enthropic
employees access to a really useful
general purpose agent in the command
line. And initially it was released to
everyone in the rest of the world as
clawed code because Anthropic made the
smart inference that most people who sit
there and type text into a command line
are going to be familiar enough to not
be scared of the word code and they
might use it for coding which lots of
the technical team at Anthropic was
doing anyway. So call it Claude code.
It's a simplifier. But people don't know
that Claude was being used by the
marketing team at Anthropic, is being
used by the legal team at Anthropic, is
being used by lots of teams at Anthropic
as a general purpose agent. And that
gives us our first clue into the vision
that Claude has and what they see as
important for agents to do. And this is
important because we're all going to be
living with agents now through who knows
when, right? as long as we have this AI
moment. Claude envisions agents as loops
that are smart with tools. And let me
get into that. Essentially, what Claude
thinks an agent should be, what
Anthropic thinks an agent should be, is
a tool that can go out, be a general
purpose agent, collect other tools
through the model context protocol
approach with which Anthropic also
pioneered, and come back and do a task
and check in with you. And so think of
it as Anthropic's vision is the agent is
going to go help you with writing, going
to go help you with Excel, going to go
help you with code, it kind of doesn't
care. It's designed to be general
purpose and it can call the tools it
needs to call smartly to get that done.
So if it needs to go get like a Python
library to do mathematical calculations
so it can get into Excel, it'll do that,
right? If it needs to go understand and
remember the skill it's using to code uh
a React component for coding, it'll go
do that. You get the idea. It calls
tools and increasingly those tool calls
are very transparent through this MCP
concept where you can call in a tool
through a special server. And so it can
go search the web through an MCP. It can
go call in your Figma designs through an
MCP. It can call in lots and lots of
things that will help you do work. But
the core of the agent loop is very
simple. It is you ask the agent to go do
a general purpose task for you. The
agent is smart enough to infer this is
what Nate wants and it goes off and it
calls the tool and it does the work and
it comes back. This makes working with
Claude feel very collaborative. It feels
colleial for lack of a better term. And
what what Anthropic wants is a vision of
the future where we are working
collaboratively with agents. We have our
mini mi, our agents. Those agents may
have sub agents which is something
Claude specifically enables now. So you
can have a master claude agent running
mini mi of itself with different
context. If that gives you a headache,
don't worry about it. You don't have to
do it. But you can, right? And the
vision is that this general purpose
agent is sort of scalable and helps you
to be more effective. helps you to do
your job and helps you to complete
higher quality work. This makes a lot of
sense when you think about the approach
to the releases that Enthropic has had
so far, right? Like they've not just
released clawed code. They've focused on
releases that further this general
purpose agent in a loop with tools kind
of vision. Excel comes to mind. The
PowerPoint release comes to mind. talked
about those on this channel and it's
been super important to see that Claude
has built not a specialurpose Excel
agent, not a specialurpose code agent,
but a general purpose large language
model that is able to call the tools it
needs to get all of these different
tasks done. You're not really calling a
different claude for these different
things. And in fact, Anthropic has sort
of belatedly realized that maybe they
shouldn't have branded this Claude code
and they started to walk back the Claude
code branding just a touch and they're
just calling it sort of a claude agent
SDK these days and that's fine, right?
Like people will still probably call it
Claude Code or they'll what whatever it
the the point is that the general
purposeness is intact and Anthropic is
recognizing how powerful it is and
they're and they're building against it.
Let's move over to Codeex. Codex is such
a different vision. Codeex is a linear
flow vision of tasking. It's not just in
codeex. It's also in the agent builder
that OpenAI released. You see an agent
fundamentally in a linear flow according
to most of what OpenAI has been
releasing and it's structured. And
what's interesting is that that is also
not coincidentally how people find chat
GPT works really well. when you're
calling it in the API, when you're
asking it to do things, even if you're
prompting it in the chatbot, giving it
the structure to do this and then come
back to me really helps. I've talked a
fair bit about how prompts sensitive
chat GPT5 is. That's another way of
saying it is dependent on you to
structure an ask and it will go and do a
task and come back. That a and so you
might think, oh, is that a loop, Nate?
Are we saying the same thing? It's not.
It's not because a chat GPT workflow,
whether it's the agent builder or
whether it's the way you think about
codecs, it's framed as beginning and
end. It's framed as a line, not a
circle. And I know that sounds silly,
but it it matters a lot because Claude
feels like and acts like it's always on,
always in a loop being your general
assistant. Codeex and even a lot of the
chat GPT5 conversations I have in the
chatbot is much more taskoriented. it's
in line with that agent builder vision
that they outlined at dev day this week
where it needs structured context a
prompt maybe an input like a document
and then it's going to go and do the
task and finish it and get it done
correctly and when you are building
agent flows at the enterprise level that
can be very helpful right you can go out
and have confidence that chat GPT and
the API will do exactly what you tell it
to do with the context and come back and
the task will be done the ticket will be
triaged whatever you're tackling. So if
you think about that, look at how that
changes how we work with agents. That
vision is so different, right? We need
to take the load of managing the
context. We need to make sure it's crisp
and clear. It is implicitly, I would
argue, a vision of developers building
meaningful scaled agents that do
specific work tasks at the big company,
enterprise, maybe middlesiz company
scale. And if you want to pull it down
and you start to look at the uh chat GPT
inapp store experience, they're also
suggesting that we consumers will have
that implicitly by having these apps
that do things for us. And it will also
feel like I want to go check my
QuickBooks as a small business owner and
I can do that or you know I want to go
and and look at how my uh Spotify
streams are doing as a creator and then
you can go and do that. Right? Fine. The
the point is that the task is what
matters and accomplishing the task is
the definition of success for the agent.
Whereas at anthropic and with claude
code, it feels more like working
together and getting stuff done over
time is the purpose and we are building
a collaborative relationship with claude
code and claude code does many things
for us. I think that shows up a bit in
the way responses come through. I have
given identical asks to claude code and
to codeex just to see what would happen.
And what I find is that claude code
tries to take that general agent in a
loop with tools approach where it will
throw multiple tools at the problem. It
will come back with a thorough analysis
of the problem and I intentionally made
it a very open-ended analysis prompt and
it will give me a really full readout
and codeex is going to take the approach
of this is a task I need to get done and
it will come back quickly token
efficiently and it will give me a pretty
much correct but extremely short and
succinct analysis. This is why codeex is
leaning on some custom modeling of GPT5
that particular command line interface.
If you're typing in the terminal to
codeex, it's not GPT5 vanilla. It's a
special model. It's a special model
that's designed to be token efficient
with specific problems. And so in this
case, when I gave Codex the analysis, it
came back and it was like, oh, I don't
need a lot of tokens for this. I'm just
going to give you exactly what you need.
And it was like 15 lines, right? And
clogged code came back and it was like
eight pages. People think about that
from a token consumption perspective and
that can be very helpful. Right? If
you're doing this hundreds of times as
codeex is often envisioned it, you know
that they envision this working at big
company scales. That's super efficient,
right? I get exactly what I need. I
don't need anything more. And then the
other side of codeex kicks in. If you
have a very difficult problem, if you
have something where you need codec to
work independently for a long period of
time on a very hard problem, especially
a coding problem, it's a specialized
agent for coding. it will go against
that task and it will solve it. It's
it's sort of like it was designed
specifically to target high impact
coding problems in large code bases at
enterprise scale. But it all keeps
coming back to codeex being designed
along with the rest of the chat GPT
agent vision as we give the LLM a task
and it goes and does it and it
accomplishes it and there's an endpoint
to that. I think this is really
important. I think it's important not
just for whether you pick claude or
codeex which we'll get into. It's also
important for how we think about how we
want to collaborate with AI going
forward because we have to kind of pick
a vision that we want to sign up for and
it leaks into the rest of the Asian
ecosystem. So if you want to pick
Claude, you're sort of voting for a
future where you want to collaborate
with AI almost like a peer and you want
to be able to give your claude a number
of different tasks to go after during
the day and it will pick up the tools it
needs and it will go get them done and
come back. That is very broadly the
vision that we are starting to see come
together from anthropic. On the other
hand, if you are someone who's like no
really I need to build production
systems. I need to make sure I have
exactly what I need for this agent. It
needs to be done correctly every time. I
cannot fool around with did the clawed
code come back with the right eight
pages of analysis or not. It must be
correct every time. I want, for lack of
a better term, more deterministic
intelligence. Fine. That's fine. At that
point, you need to sign up with codecs.
You need to sign up not just with
codecs, but that's something that
implies an agent builder world, right?
where you are deterministically saying
this is exactly what I want and I want
to get it done. I'm not here to tell you
which is better or worse. I'm actually
here to give you a sense of the
underlying Asian agent vision so you can
look at it and say for yourself this is
what I need for my tool set. The agent
market is so big. It is absolutely
possible we have multiple winners here,
right? We have codeex winning maybe for
enterprise workflows and for highly
deterministic workflows where you need
to get it right every single time or for
workflows that are extremely complex and
accomplishment looks like solving this
really tricky bug in this really
particular place and then you may have a
winner with Claude where Claude is a
general purpose agent that works really
effectively for you and different people
will have different opinions on that.
The rest of the agent ecosystem is
falling into one of those camps. If you
look at N8N, NAND is a drag and drop
agent builder or you can use JSON to
program it. I've talked about it before.
It's really cool. One of the things
that's nice after the agent builder
launch is that NAND doesn't lock lock
you into the OpenAI ecosystem and people
appreciate that. Well, it still has the
same vision of the agent that OpenAI
has. They go and they do a task and they
come back. They go and they do a task
and it's accomplished, right? It's not a
ongoing conversation. It's not an
ongoing general purpose piece of work.
They go and get it done and that's it.
Right? You can grade whether the agent
got it done correctly or not with NAD
and with any of OpenAI's agents. And
there are others too. Lindy.ai comes to
mind. Very consumer focused, but the
agents do things and they either do them
correctly or incorrectly and then it's
done. It's it's a linear flow. Whereas
there are others who are building much
more collaborative agents. And what's
interesting is that they're not just for
work. Like what's what's interesting
about Anthropic is they're going after
the working world, but some of the
people building agents not in the
working world are adopting a very
similar approach. So as an example, the
tool AI companion has taken off in the
last couple of months. It's not a
working thing. It's an always on AI
companion you can talk to and it feels
very much like a sort of generalpurpose
conversational agent except in this case
the agent's job is not to operate tools
and build you Excel files. The agent's
job is actually to call upon its
internal resources and be an interesting
conversational partner. And so Enthropic
has picked the harder version, right?
Like they have to get the tool calls
right. They have to sort of build the
outputs and come back to you. But the
loop is there. It's always on. It
listens. It comes back. The loop is
there. We are going to see a great split
in 2026 between people, enterprises who
want to have generalurpose
conversational agents that run sort of
always on and they're calling tools and
they're able to accomplish multiple
tasks for you like cloud code and a
specific vision of the future that is
all about deterministically,
intelligently solving hard tasks and
being able to say done. a linear vision
of the agentic future. And that matters
a lot because it it shapes your and my
day, right? Like, is my day going to be
working with Claude code and sort of
collaborating together? Is my day going
to be defining a bunch of very specific
accomplishments that they need to get
done for codeex? We're going to all find
out together. Enterprises are going to
pick different sides. People are going
to have strong strong preferences, but
at least we should understand what the
stakes are, right? like we should ladder
back and not just look at what happened
with agent builder, not just look at is
codeex good or bad on its own, but
actually understand what are the
competing visions of our future of the
agentic AI future and which do we want
to sign up for. So I'm going to do more
of a deep dive on on codec soon. I've
done on some on claude code. I wrote a
guide on how claude code can help you as
a non-technical person. But if you are
curious about which to choose, I would
invite you to ask yourself, how do you
want to spend your day? Do you want to
evolve the task together with the AI,
more of a cloud code approach, and work
organically back and forth, or do you
really need the precision and you're
willing to do the structure that gets
you a codeex-like experience? I think
those are underweighted reasons to pick
one or the other. I think most people
are asking which is the better coding
agent per se, but I think that's kind of
the wrong question because it's going to
go back and forth and you'll have to
look at SWEBench scores and argue over
which one is better for which code
language, etc. It's more interesting to
say these trajectories are fundamentally
different. Which one do I want to sign
up for? What's your aentic vision of the
future?