Codeex Upgrade Boosts Coding Precision
Key Points
- On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance.
- The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness.
- Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks.
- Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering.
- Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows.
Sections
- Codeex Upgrade Improves Coding AI - The new Codeex upgrade, a ChatGPT‑5 variant optimized for programming, fixes difficult surgical edits and enhances correctness on long, agentic coding tasks, lessening the need for complex prompting workarounds.
- Claude's Shift Toward Content Creation - The speaker critiques Claude's new data‑connector and document‑generation features as a strategic pivot, suggesting it signals an arms‑race with competitors and highlights the importance of engineering quality over rushed releases.
- Codeex: Precise AI Editing Breakthrough - The speaker stresses that the upcoming Codeex release is a significant, data‑backed advancement for developers, promising targeted, surgical code edits rather than wholesale refactoring, and urges the community to look past hype and embrace the change.
Full Transcript
# Codeex Upgrade Boosts Coding Precision **Source:** [https://www.youtube.com/watch?v=7oIkPW217AY](https://www.youtube.com/watch?v=7oIkPW217AY) **Duration:** 00:09:51 ## Summary - On September 15, OpenAI released a Codeex upgrade—a specialized “ChatGPT‑5 for coding” model designed to improve the engineering platform’s performance. - The new model addresses two major pain points: making precise, low‑token “surgical” code edits and executing long, agentic coding tasks with far higher correctness. - Improvements stem from a stronger reasoning component tailored to code execution and prompt comprehension, allowing the model to allocate tokens efficiently—few for small edits, many for extensive tasks. - Unlike earlier “sticky” GPT‑5 behavior that required elaborate prompting to steer output, the Codeex flavor understands straightforward engineering prompts out‑of‑the‑box, reducing the need for complex prompt engineering. - Because developers naturally write concrete, specific instructions, this enhanced prompt awareness translates into more reliable, usable code assistance across a range of AI‑driven development workflows. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=0s) **Codeex Upgrade Improves Coding AI** - The new Codeex upgrade, a ChatGPT‑5 variant optimized for programming, fixes difficult surgical edits and enhances correctness on long, agentic coding tasks, lessening the need for complex prompting workarounds. - [00:04:16](https://www.youtube.com/watch?v=7oIkPW217AY&t=256s) **Claude's Shift Toward Content Creation** - The speaker critiques Claude's new data‑connector and document‑generation features as a strategic pivot, suggesting it signals an arms‑race with competitors and highlights the importance of engineering quality over rushed releases. - [00:09:00](https://www.youtube.com/watch?v=7oIkPW217AY&t=540s) **Codeex: Precise AI Editing Breakthrough** - The speaker stresses that the upcoming Codeex release is a significant, data‑backed advancement for developers, promising targeted, surgical code edits rather than wholesale refactoring, and urges the community to look past hype and embrace the change. ## Full Transcript
On Monday, September 15th, Chad GPT
launched an upgrade to Codeex. Codeex,
of course, is the engineering platform
that chat GPT has been building out that
OpenAI has been building out. In this
case, the upgrade to Codeex is really a
new flavor of Chat GPT5 optimized
specifically for coding. It fixes two
things that have been really frustrating
to most of us who are building with
codecs, building with cloud code,
building with any AI tool. Namely, it is
really, really, really hard to get them
to stop and just fix one thing. Like,
surgical edits have been really tough.
And it has been difficult to get them to
do long agentic tasks with a high degree
of correctness. And that last phrase is
important because if you use them, you
know, they do long agentic tasks very
easily, but not always with a high
degree of correctness. Now, I've talked
in the past about how you address this
with prompting. I've talked about how
you address this with data chunking.
I've talked about how you address this
with how you handle your codebase and
feed your codebase as context and how
you keep track in markdown files of the
decisions you've made. There's all kinds
of tricks that people are using. Those
tricks are probably still helpful, but
it sure does help if there is a base
model that is actually better at those
core tasks. And so if you ask yourself
how or why does it suddenly work, I
think the thing that you're going to
find when you peel the onion and think
about it is that they've improved the
quality of the reasoner specifically
around code execution tasks and
understanding coding related prompts.
That is the only way that you can get a
model flavor that is simultaneously
much much more stingy with tokens when
making a surgical edit and much much
more lax or uh extensive with tokens
when making a long agentic task. It must
understand what you want better which is
a big deal when you think about it
because one of the things that's been
really really hard about shadow GPT5 as
a whole is that it feels sticky. It
feels like it's in a rut. It feels like
no matter what prompt you get, you get
this hyperactive speedboat of a model
that says, "Here's all the action items
and this is what we're going to do." And
you have to really lean on the prompt
heavily to get it to do anything else.
And I've talked a ton about how to lean
on the prompt. And I'm going to have
another video soon about doing it again.
But in this case, this is a flag of
something different that I don't want
you to lose. In this situation, the
model is getting better at understanding
your prompt. the model is getting better
at understanding your prompt without you
having to prompt fancy and that is a
really big deal. Now granted it's for
code. Code is probably the easiest use
case for prompting parsing because
frankly engineers tend to be pretty
specific. Engineers tend to be very
concrete. Engineers tend to refer to
real specific code actions. And so yeah,
getting it to be a little bit better at
understanding that is probably easy mode
if you're trying to get a model to get
better at parsing and understanding
prompting. But it's still a step. It's a
big step for this model because Chad
GBT5 as a whole has not made it easier
for people to prompt. I know multiple
people who have thrown up their hands
and given up on working with advanced
models, given up on prompting because
chat GBT5 has been such a difficult
model to prompt. I I get it. Like it's
not me, right? Like I love this stuff,
but like I get why it makes sense. It
shouldn't be this hard. It shouldn't be
this hard. Seems to be what the team was
thinking about when they made this
update. It should be easier. And yeah,
it's got a little bit of a high score on
Sweetbench and this and that. The real
takeaway here is that this team at
OpenAI continues to ship really, really
fast. Whatever you think about the whole
brewhaha around the Reddit thread on
claude code and how many Redditors are
real over there saying they're moving to
codeex. The momentum shift toward codeex
is real. There has been a massive
momentum swing toward codeex and that
has shifted the strategic battleground
for a long time now. It has been a
truism that OpenAI has the best general
market position given their consumer
base and Claude has the best specialist
position given their beach head on code.
That is changing and now you see it
changing even Claude strategy because
Claude is emphasizing more now. Hey, we
have these data connectors. Hey, we
launched this amazing PDF creation and
this amazing PowerPoint creation, this
great Excel creation file. I made a
video on that. It's really, really good.
It's a different strategy. And to me,
the fact that they chose to release that
and not claude code feels feels a little
bit desperate. If they had something to
release that would compete with where
Codeex is going and how fast Codeex was
shipping, they would they would. Now, I
say that on Monday, September 15th, as I
am aware that Daario has a big speech
coming up this week and there are rumors
that Opus 4.5 is coming out. So, we may
be talking at the end of the week about
the big move they made, and that is just
how these games go, right? It's an arms
race. It's back and forth. But
regardless of what launches this week,
you should be aware. You should be aware
that the strategic landscape has shifted
and that launches like this reinforce a
quality of engineering effort that make
the experience sticky. They make it
sticky. If you have the choice between
more power at your fingertips, that's
correct. and more power at your
fingertips that is incorrect or likely
to lead to bad pull requests. You're
choosing the quality every single time
because it makes you do less rework.
Every engineer, 10 out of 10 engineers
will choose that. And they're right. And
actually that goes for other parts of
work, too. Part of what ironically made
Claude's connectors release with Excel
powerful is that it actually got more of
Excel right than anything I'd seen
previously from OpenAI. Similarly with
PowerPoint, it was easier to make a good
PowerPoint deck than it had ever been
before. I even got good results out of
the PDF. I haven't done the video on
that, but I'm going to do the video on
that. The point is this. You need to
prioritize the models that give you
quality work, and you need to expect
that those model changes will be real,
but rarer than you think. And this is
sort of a fine grain point, but if you
think about it, clawed code has been the
best overall coding ecosystem for over a
year now. And only now are we starting
to see a shift toward codecs. And
because these shifts are sticky, because
the changes that are being made
reinforce quality, because the team is
shipping really fast versus claude, I
expect that that shift will be sticky.
Now, am I at a point where I'm willing
to declare that anything is a permanent
advantage in AI? Absolutely not. You
should always be thinking multimodel
over the long term. But there's a
difference between thinking about
multimodel use cases when you're
building production pipelines and
thinking about positions in the
ecosystem. And positions in the
ecosystem are stickier. They're
stickier. In this case, codeex is
starting to shift and nudge Claude out
of the coding position in the ecosystem.
That's a very powerful spot to be in
because of all the other things that
code allows you to unlock and get
leverage on. The fact that more code is
going in as reinforcement learning to
open AAI is a non-trivial benefit that
they are acquiring directly from another
player in the ecosystem right now,
directly from Claude. So, I would expect
that Codeex will stick around. I'm going
to be doing a much longer sort of video
on Codeex. This was just my intro. This
is the breaking news update. If you step
back, if you look at where we are on
this exponential curve that we're all
living through, I think one of the
things that comes to mind for me is that
we are bored by the hype and we have
forgotten how tremendous some of this
news is because we have gotten so used
to all of these updates. Humans can get
used to anything. We have gotten used to
a tremendous stream of news over the
last two and a half years. If codeex had
dropped out of a blue sky in 2022, it
would have been on the front pages of
all kinds of newspapers, even though it
was a coding thing because it's such an
intelligent model, but it's just another
Monday in September now. We've gotten
used to it. I want to challenge you,
especially as these models get better,
as they get more agentic, as they are
literally the graph, the graph is there,
right? As they're able to do this much
more for you if you prompt them well
because they're more agentic. And that's
exactly what Codeex can do. If you're an
engineer, the rewards are going to be
disproportionate. If you do not get
bored by the hype, if you stay focused,
if you know what you want out of AI, and
if you're able to take advantage of it
and build the way you want to build, and
not everybody builds with code, some
people build with words, some people
build with math, etc. But you have to
decide what you care about, and you have
to latch on to that stream, and you have
to follow it, and you have to take it
seriously, and you have to upgrade your
tool sets a lot. The learning curve is
going to be real because we're all going
through this exponential curve together.
Don't get fooled by everyone else saying
it's just another Monday. It's not just
another Monday. The news is going to
keep coming. There will be more big
releases even this week, but this was a
big deal. And I hope you have fun
building something with Codeex. I hope
someone who has built code before, I
really hope that this whole promise of
codeex being better, which they they did
quantitative analysis, right? It's not
that they're just promising. you're
actually looking at pull requests, etc.
But I really hope that this actually
bears out for all of us because it would
be really nice to have an AI that does
not have this obsession with refactoring
the entire codebase at the drop of a
hat. It would be nice to have surgical
edits. And so here's to surgical edits.
Here's to exponential change. Here's to
seeing through the hype and recognizing
it's not hype to get bored by. It really
is a big deal. It's not just another
Monday. Have fun with Codex.