OpenAI’s Delayed Multimodal Release Strategy
Key Points
- OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately.
- Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart.
- OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems.
- The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior.
- Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves.
Sections
- OpenAI vs Google Multimodal Rollouts - The speaker critiques OpenAI’s repetitive product strategy while highlighting Gemini Flash Experimental’s capabilities and comparing it to Google’s newly upgraded, truly multimodal model “40,” noting distinct quality differences.
- OpenAI's Competitor-Driven Release Strategy - The speaker argues OpenAI times new models like ChatGPT 5 to match rivals rather than prioritizing consumer value, urging a shift toward user‑focused product releases.
- Balancing Praise with Product Critique - The speaker lauds the team's impressive model, cautiously critiques its product strategy, promises a deeper write‑up, and solicits opinions and a comparison between the 40 model and Gemini.
Full Transcript
# OpenAI’s Delayed Multimodal Release Strategy **Source:** [https://www.youtube.com/watch?v=msHq7IpMh1o](https://www.youtube.com/watch?v=msHq7IpMh1o) **Duration:** 00:07:29 ## Summary - OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately. - Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart. - OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems. - The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior. - Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves. ## Sections - [00:00:00](https://www.youtube.com/watch?v=msHq7IpMh1o&t=0s) **OpenAI vs Google Multimodal Rollouts** - The speaker critiques OpenAI’s repetitive product strategy while highlighting Gemini Flash Experimental’s capabilities and comparing it to Google’s newly upgraded, truly multimodal model “40,” noting distinct quality differences. - [00:03:38](https://www.youtube.com/watch?v=msHq7IpMh1o&t=218s) **OpenAI's Competitor-Driven Release Strategy** - The speaker argues OpenAI times new models like ChatGPT 5 to match rivals rather than prioritizing consumer value, urging a shift toward user‑focused product releases. - [00:07:08](https://www.youtube.com/watch?v=msHq7IpMh1o&t=428s) **Balancing Praise with Product Critique** - The speaker lauds the team's impressive model, cautiously critiques its product strategy, promises a deeper write‑up, and solicits opinions and a comparison between the 40 model and Gemini. ## Full Transcript
Open AAI is back to their old ways. They
are using their old product strategy
playbook to drive releases, which means
that they're releasing second, even when
they have the feature almost certainly
in the can already. So, if you remember
back, Gemini released just last week a
new multimodal art model called uh
Gemini Flash Experimental. Uh they've
labeled it like images in AI Studio
there for Gemini users. It's very very
good. People have been using it to do
product photo shoots where they pull the
product out of someone's hand and they
moodlight it. People have been using it
to do outfit tryons. People have been
using it to literally edit an image with
text. Like you can change the background
of a wall. You can change an object in
the image and it can be photorealistic.
Fantastic model. Well, OpenAI actually
talked about true multimodal where the
model will take text as a first class
input and images as a first class input
and output. And they did that months
ago. And so to have Google beat them is
not something they wanted the public to
think about. And so a week later they
drop their multimodal model. They call
it 40, which you already have. 40 is 40.
But they've upgraded 40 under the
surface with a really really different
image model. It is natively multimodal
now. You can feel the difference. And
what's interesting is these are not
equivalent models. So I asked the exact
same prompt of both of them and I got
really interesting quality differences.
I got a better lean toward photo realism
in Google Gemini. I got a lean toward
creativity and artfulness in
OpenAI. And OpenAI was worse at
interpreting a colored edit suggestion
than Gemini. Gemini correctly understood
I was only referring to an area of the
image, whereas OpenAI assumed I was
referring to the entire background. But
Google Gemini critically misunderstood
the actual composition of what I was
asking for. And OpenAI did not. open
understood what I was actually asking it
to
create. Neither of them is perfect. I'm
using that example to show that you have
to do your own testing and you'll
probably have to try both to see what
you want. I want to take us back to that
strategy piece though. At the end of the
day, it is frustrating to me that
OpenAI, a consumer company, a company
that Sam Alman gave an entire interview
on last week to Stratecher emphasizing
how much of a consumer company they are,
and they're not releasing the great
stuff they have when it's ready for
customers. They're sitting there staring
at at competitors and going second so
that they can try and claim a PR
victory. That's not customer obsessed. I
don't think they should be doing that. I
think if you have it ready, you should
release it. And I think that great
customerfacing companies historically
have done that. When they have the right
product ready, they release it on their
time. And this is a real established
pattern with OpenAI where it's almost
like they're playing inside baseball
with the other model makers. They talk
about being a consumer product and they
are. They have 400 million active users
per month. But in a lot of ways, they
don't act like a grown-up consumerf
facing company yet. They don't have the
consumer obsession that defines
companies like Apple, that defines
companies like Amazon, even Netflix.
Instead, they tend to look at the other
model makers and they can get into a bit
of a standoff. In fact, I strongly
suspect the exact timing of Chat GPT5
and Claude 3 will be
interrelated. Or at the very least, Chat
GPT5's release timing will be tied into
another model released from Google, from
Meta, from Anthropic, maybe even from
DeepSeek. And if that's the case, it's
yet again going to be confirmation that
OpenAI really is looking not to the
consumer to drive their release cadence,
but to other model makers. It's not a
mature product company motion. It's
something that probably is going to need
to
shift because
ultimately the average person doesn't
care who released the image model first.
My grandmother doesn't care whether
Google released it last week or OpenAI
released it this week. She's going to
care if it makes the photo on her phone.
Is it good or not good? And OpenAI has a
huge advantage there. They have the
product surface that most people
familiar with AI understand to use for
AI. Everybody knows Chat GPT and the 40
model is the baseline model. So far so
good. That makes sense. So why worry
about exactly when Google releases
stuff? Why worry about exactly when
Enthropic releases stuff? Why not just
release what you got to your
consumers? That's my hot take. I think
OpenAI is queuing their product strategy
around other competitors incorrectly and
I think they should be focused more on
consumers. I think that's to their
interest as a company long term. And I
think the things they're worried about
sort of losing the PR battle for a cycle
or two are not that big a deal. Like if
you can imagine the reverse where Open
AI released this when it was ready,
maybe a couple of weeks back, maybe a
month back, and then so you know, Google
comes along and and they release theirs
later on. It's not really a PR
difference. It doesn't really make a
difference. And
so I think that we really need to see
some grown-up consumer focused behavior
from some of these model makers now that
we are seeing a much larger consumer
footprint. If we expect AI to be in
people's houses, if we expect it to be
on people's phones, if we expect it to
be a daily hourly touch for people, we
have to act like that when we build and
release products. And right now it's
feeling much more like a why combinator
who releases first very Silicon Valley
insider kind of thing. It's not super
helpful to customers. So that that's my
hot take. But that shouldn't detract
from the fact that this is a great
model. Great people worked on it. The
teams at both of these companies at
Gemini and at OpenAI are fantastic.
They're shipping great stuff. Um and
they should be proud. Like these are
really hard challenges that they're
solving with multimodal. And I think
that we've taken a massive leap forward
on image generation. Again, like to my
video yesterday, it's hard to describe
that unless you're really clear and
specific. Like if you can say before you
could put a Coke can in someone's hand
in a drawing, but then you couldn't move
it around. The Coke can was like frozen
to the hand. And now you can just edit
it and move the Coke can over here with
just your your written text. That makes
the light bulbs go off. Now people
understand. So, I do think we need to
get better at talking about this stuff.
I don't want my critique of the product
strategy to come off as critiquing the
individuals who did the hard work
because this is it's an amazing model.
Like, it's incredible work that they've
done. So, I'll probably write up more on
it later on, but I wanted to throw it
out there. What do you think? Have you
tried the 40 model? How does it compare
to Gemini?