Learning Library

← Back to Library

OpenAI’s Delayed Multimodal Release Strategy

7m • Unknown Channel • ai-ml • review • intermediate • Watch on YouTube ↗

Key Points

OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately.
Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart.
OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems.
The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior.
Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves.

Sections

Full Transcript

# OpenAI’s Delayed Multimodal Release Strategy **Source:** [https://www.youtube.com/watch?v=msHq7IpMh1o](https://www.youtube.com/watch?v=msHq7IpMh1o) **Duration:** 00:07:29 ## Summary - OpenAI is reverting to an old product‑release playbook, deliberately delaying launches of ready‑to‑ship features to position themselves as “second‑movers” for PR impact rather than serving customers immediately. - Google’s recently upgraded Gemini model (dubbed “40”) is now truly multimodal, delivering a distinct image generation engine that leans toward photorealism and interprets localized edit prompts more accurately than OpenAI’s counterpart. - OpenAI’s new multimodal model, while more creative and artistic, misinterprets colored‑edit instructions and often applies changes globally, highlighting complementary strengths and weaknesses between the two systems. - The narrator stresses that developers should run their own side‑by‑side testing of both models to determine which fits their specific use case, since neither solution is universally superior. - Overall, the speaker argues that a genuinely consumer‑focused company should release polished functionality as soon as it’s ready, rather than staging releases around competitor moves. ## Sections - [00:00:00](https://www.youtube.com/watch?v=msHq7IpMh1o&t=0s) **OpenAI vs Google Multimodal Rollouts** - The speaker critiques OpenAI’s repetitive product strategy while highlighting Gemini Flash Experimental’s capabilities and comparing it to Google’s newly upgraded, truly multimodal model “40,” noting distinct quality differences. - [00:03:38](https://www.youtube.com/watch?v=msHq7IpMh1o&t=218s) **OpenAI's Competitor-Driven Release Strategy** - The speaker argues OpenAI times new models like ChatGPT 5 to match rivals rather than prioritizing consumer value, urging a shift toward user‑focused product releases. - [00:07:08](https://www.youtube.com/watch?v=msHq7IpMh1o&t=428s) **Balancing Praise with Product Critique** - The speaker lauds the team's impressive model, cautiously critiques its product strategy, promises a deeper write‑up, and solicits opinions and a comparison between the 40 model and Gemini. ## Full Transcript

0:00Open AAI is back to their old ways. They 0:03are using their old product strategy 0:04playbook to drive releases, which means 0:06that they're releasing second, even when 0:08they have the feature almost certainly 0:11in the can already. So, if you remember 0:14back, Gemini released just last week a 0:16new multimodal art model called uh 0:20Gemini Flash Experimental. Uh they've 0:22labeled it like images in AI Studio 0:25there for Gemini users. It's very very 0:28good. People have been using it to do 0:30product photo shoots where they pull the 0:32product out of someone's hand and they 0:34moodlight it. People have been using it 0:36to do outfit tryons. People have been 0:38using it to literally edit an image with 0:41text. Like you can change the background 0:43of a wall. You can change an object in 0:45the image and it can be photorealistic. 0:48Fantastic model. Well, OpenAI actually 0:51talked about true multimodal where the 0:54model will take text as a first class 0:56input and images as a first class input 0:58and output. And they did that months 1:01ago. And so to have Google beat them is 1:05not something they wanted the public to 1:07think about. And so a week later they 1:09drop their multimodal model. They call 1:11it 40, which you already have. 40 is 40. 1:15But they've upgraded 40 under the 1:17surface with a really really different 1:18image model. It is natively multimodal 1:22now. You can feel the difference. And 1:24what's interesting is these are not 1:27equivalent models. So I asked the exact 1:32same prompt of both of them and I got 1:35really interesting quality differences. 1:37I got a better lean toward photo realism 1:40in Google Gemini. I got a lean toward 1:43creativity and artfulness in 1:45OpenAI. And OpenAI was worse at 1:48interpreting a colored edit suggestion 1:50than Gemini. Gemini correctly understood 1:53I was only referring to an area of the 1:55image, whereas OpenAI assumed I was 1:57referring to the entire background. But 2:01Google Gemini critically misunderstood 2:04the actual composition of what I was 2:06asking for. And OpenAI did not. open 2:10understood what I was actually asking it 2:12to 2:12create. Neither of them is perfect. I'm 2:15using that example to show that you have 2:17to do your own testing and you'll 2:18probably have to try both to see what 2:20you want. I want to take us back to that 2:22strategy piece though. At the end of the 2:25day, it is frustrating to me that 2:27OpenAI, a consumer company, a company 2:29that Sam Alman gave an entire interview 2:32on last week to Stratecher emphasizing 2:36how much of a consumer company they are, 2:38and they're not releasing the great 2:40stuff they have when it's ready for 2:42customers. They're sitting there staring 2:45at at competitors and going second so 2:48that they can try and claim a PR 2:49victory. That's not customer obsessed. I 2:53don't think they should be doing that. I 2:54think if you have it ready, you should 2:56release it. And I think that great 2:59customerfacing companies historically 3:01have done that. When they have the right 3:03product ready, they release it on their 3:06time. And this is a real established 3:10pattern with OpenAI where it's almost 3:12like they're playing inside baseball 3:13with the other model makers. They talk 3:15about being a consumer product and they 3:17are. They have 400 million active users 3:19per month. But in a lot of ways, they 3:22don't act like a grown-up consumerf 3:24facing company yet. They don't have the 3:26consumer obsession that defines 3:29companies like Apple, that defines 3:32companies like Amazon, even Netflix. 3:35Instead, they tend to look at the other 3:38model makers and they can get into a bit 3:39of a standoff. In fact, I strongly 3:41suspect the exact timing of Chat GPT5 3:44and Claude 3 will be 3:47interrelated. Or at the very least, Chat 3:50GPT5's release timing will be tied into 3:53another model released from Google, from 3:55Meta, from Anthropic, maybe even from 3:58DeepSeek. And if that's the case, it's 4:01yet again going to be confirmation that 4:03OpenAI really is looking not to the 4:06consumer to drive their release cadence, 4:08but to other model makers. It's not a 4:12mature product company motion. It's 4:15something that probably is going to need 4:16to 4:17shift because 4:20ultimately the average person doesn't 4:22care who released the image model first. 4:25My grandmother doesn't care whether 4:27Google released it last week or OpenAI 4:29released it this week. She's going to 4:31care if it makes the photo on her phone. 4:34Is it good or not good? And OpenAI has a 4:37huge advantage there. They have the 4:40product surface that most people 4:42familiar with AI understand to use for 4:45AI. Everybody knows Chat GPT and the 40 4:48model is the baseline model. So far so 4:52good. That makes sense. So why worry 4:55about exactly when Google releases 4:57stuff? Why worry about exactly when 4:59Enthropic releases stuff? Why not just 5:02release what you got to your 5:03consumers? That's my hot take. I think 5:06OpenAI is queuing their product strategy 5:10around other competitors incorrectly and 5:13I think they should be focused more on 5:15consumers. I think that's to their 5:16interest as a company long term. And I 5:19think the things they're worried about 5:20sort of losing the PR battle for a cycle 5:23or two are not that big a deal. Like if 5:26you can imagine the reverse where Open 5:28AI released this when it was ready, 5:30maybe a couple of weeks back, maybe a 5:32month back, and then so you know, Google 5:34comes along and and they release theirs 5:36later on. It's not really a PR 5:38difference. It doesn't really make a 5:40difference. And 5:42so I think that we really need to see 5:45some grown-up consumer focused behavior 5:48from some of these model makers now that 5:50we are seeing a much larger consumer 5:53footprint. If we expect AI to be in 5:56people's houses, if we expect it to be 5:57on people's phones, if we expect it to 5:59be a daily hourly touch for people, we 6:02have to act like that when we build and 6:04release products. And right now it's 6:07feeling much more like a why combinator 6:09who releases first very Silicon Valley 6:12insider kind of thing. It's not super 6:14helpful to customers. So that that's my 6:16hot take. But that shouldn't detract 6:18from the fact that this is a great 6:20model. Great people worked on it. The 6:22teams at both of these companies at 6:24Gemini and at OpenAI are fantastic. 6:27They're shipping great stuff. Um and 6:30they should be proud. Like these are 6:31really hard challenges that they're 6:33solving with multimodal. And I think 6:35that we've taken a massive leap forward 6:37on image generation. Again, like to my 6:40video yesterday, it's hard to describe 6:42that unless you're really clear and 6:44specific. Like if you can say before you 6:46could put a Coke can in someone's hand 6:48in a drawing, but then you couldn't move 6:50it around. The Coke can was like frozen 6:53to the hand. And now you can just edit 6:54it and move the Coke can over here with 6:56just your your written text. That makes 6:59the light bulbs go off. Now people 7:01understand. So, I do think we need to 7:04get better at talking about this stuff. 7:06I don't want my critique of the product 7:08strategy to come off as critiquing the 7:10individuals who did the hard work 7:13because this is it's an amazing model. 7:16Like, it's incredible work that they've 7:17done. So, I'll probably write up more on 7:20it later on, but I wanted to throw it 7:22out there. What do you think? Have you 7:24tried the 40 model? How does it compare 7:26to Gemini?