Learning Library

← Back to Library

GPT-5 Launch Sparks Debate

Key Points

  • The rapid growth of tool‑calling will lead to thousands or even tens of thousands of tools, creating huge opportunities for continuous ecosystem improvements beyond pure model performance.
  • The episode was recorded early in the week and released ahead of schedule to stay timely after the surprise Thursday launch of GPT‑5.
  • Both guests, Chris Hay and Mihai Criveti, agreed that while GPT‑5 is impressive, it has not yet supplanted Claude as their primary daily development tool.
  • OpenAI’s livestream announced the GPT‑5 suite—including a core model plus “mini” and “nano” variants—and introduced new “Thinking” and “Pro” modes across various free and paid tiers with differing rate limits.
  • The hosts framed the launch as another dramatic moment in the AI industry, setting the stage for ongoing debates about the impact of these new capabilities.

Sections

Full Transcript

# GPT-5 Launch Sparks Debate **Source:** [https://www.youtube.com/watch?v=A30mVgbG-OQ](https://www.youtube.com/watch?v=A30mVgbG-OQ) **Duration:** 00:33:54 ## Summary - The rapid growth of tool‑calling will lead to thousands or even tens of thousands of tools, creating huge opportunities for continuous ecosystem improvements beyond pure model performance. - The episode was recorded early in the week and released ahead of schedule to stay timely after the surprise Thursday launch of GPT‑5. - Both guests, Chris Hay and Mihai Criveti, agreed that while GPT‑5 is impressive, it has not yet supplanted Claude as their primary daily development tool. - OpenAI’s livestream announced the GPT‑5 suite—including a core model plus “mini” and “nano” variants—and introduced new “Thinking” and “Pro” modes across various free and paid tiers with differing rate limits. - The hosts framed the launch as another dramatic moment in the AI industry, setting the stage for ongoing debates about the impact of these new capabilities. ## Sections - [00:00:00](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=0s) **GPT‑5 Release Sparks Tool Surge** - A panel of experts debates how GPT‑5’s debut is driving a rapid proliferation of AI tools and could challenge Claude as developers’ primary daily assistant. - [00:03:07](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=187s) **OpenAI's GPT‑5 Release Highlights** - The speaker outlines three key aspects of the new GPT‑5 rollout: a unified model router to simplify model selection, modest benchmark improvements, and a notable boost in reliability through reduced hallucinations. - [00:06:11](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=371s) **Nano Model Beats Larger Counterparts** - The speaker praises the API's nano model (and GPT‑5) for surpassing big models in tasks like function calling and game‑based reasoning, noting its strong performance in demos such as the “Murdle” detective game. - [00:09:30](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=570s) **Expectations vs Reality for GPT-5** - The speaker notes that the new model’s focus aligns with earlier design cues and resolves prior free‑model limitations, making its direction unsurprising. - [00:12:37](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=757s) **Scaling AI While Managing User Expectations** - The speaker highlights how a newly‑released, high‑traffic AI model alleviates many pain points and outpaces niche competitors, yet stresses the gap between lofty fantasies of instant world‑creation and the practical necessity of delivering a simple, mass‑market‑friendly experience. - [00:15:55](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=955s) **Embedding Tools in Model Analysis** - The speaker explains that the model’s internal analysis channel conducts token prediction and can silently invoke tools such as Python for calculations, a strategy intended to enhance answer accuracy and reduce hallucinations. - [00:18:57](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=1137s) **Scaling Inference and UI Innovation** - The speaker highlights how lower inference costs, faster and smaller models, massive parallel tool calls, and superior cloud‑based interfaces are accelerating the ecosystem toward more powerful, user‑friendly AI systems on the path to AGI. - [00:22:02](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=1322s) **Parallel AI Model Workflow Comparison** - The speaker describes using multiple AI models side‑by‑side for coding tasks—especially unit‑test generation—and notes that while ChatGPT struggles, Opus consistently delivers better results in their integrated workflow. - [00:25:06](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=1506s) **Internet‑Synced Blinking Donkey Demo** - The speaker walks the audience through a live demo of a code‑generated donkey that blinks in sync with an internet clock, noting improvements over previous versions and contrasting it with other AI models. - [00:28:11](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=1691s) **ChatGPT-5 Canvas Coding Struggles** - The speaker describes repeatedly prompting ChatGPT‑5 to create a blinking‑donkey canvas script, battling truncated outputs and lengthy copy‑paste fixes, highlighting the inefficiency versus Claude's smoother handling. - [00:31:13](https://www.youtube.com/watch?v=A30mVgbG-OQ&t=1873s) **Balancing Cost, Experience, and Model Choice** - Panelists discuss the benefits of AI competition, improvements in GPT‑5’s front‑end, frustrations with high subscription fees, and the appeal of combining cloud, GPT‑5, and open‑source models in their workflows. ## Full Transcript
0:00With tool calling improving, 0:02as Chris was saying, we're not going to save hundreds of tools. 0:04We're going to see thousands or tens of thousands, 0:07and there's a lot of opportunity for continuous improvement in just the ecosystem alone. 0:12And I think we're going to see a lot more substantial 0:15improvements in all these areas 0:18outside of just the model performance 0:20that are going to get us closer to that wow, 0:22AGI moment. 0:29Hello everyone. 0:30Welcome the Mixture of Experts. 0:31I am Bryan Casey, your default host for bonus episodes. Um, 0:35as some of you may have noticed, 0:37it's been a big week in AI this week. 0:38We actually recorded earlier this week covering 0:41some of the big announcements around gpt-oss and Genie 3. 0:46And then it became very clear that after we had recorded that, 0:49that GPT-5 was going to drop on Thursday. 0:51So we made the decision just to release that episode on Wednesday. Um, 0:55so it was still timely. 0:56And then come back to you today with a discussion 0:59around reactions and thoughts on the GPT-5 release. 1:02So I'm joined today by 1:05Chris Hay, CTO of Customer Transformation, 1:09and Mihai Criveti, Distinguished 1:11Engineer of Agentic AI. 1:13And we are going to get into a discussion of the GPT-5 1:16release, early reactions to it, thoughts on coding. 1:20And maybe that's actually a good place to start 1:22with the opening question today. 1:27One of the big questions 1:29coming into this release was, is GP-5 1:33going to replace Claude as the daily driver 1:37for developers all over the world? 1:39And so while we can't speak for everybody, 1:41we can share our own opinions on that. 1:43And so maybe I'll start with you, Chris. 1:45Early reactions of like, do you think GPT-5 is going to be replacing 1:49Claude for you as a daily driver? 1:52No. 1:53Sadly, I had high hopes. 1:55But no, Mihai, Hi. 1:57You know, it's kind of funny. 1:59I was refreshing my chat 2:01all night, and somewhere around 1 or 2 a.m., 2:04I got access to it, so I got out of bed. 2:06I rushed my machine, and I was working all night. 2:09Trying out. Promise. Trying it with different tools, trying it with MCP. 2:13And in the morning I had to start my day job 2:15and I said, all right, back to Claude. 2:16And I found myself using Claude Cpde and Opus 4.1 again. 2:20I think it's an amazing model, 2:22but right now it's not replacing it yet for me. 2:25All right. Well, I think that sets the stage 2:27really nicely for, uh, some of the drama. Um, 2:30because there's always drama in the AI industry. So, 2:32um, with that, we'll get into today's episode 2:35and I'll start by maybe just doing a quick recap. 2:37I'm sure many of you have seen the news, um, 2:40already, but I'll just go through for for those of you 2:42who might not have, uh, on Thursday afternoon, 2:45OpenAI did a livestream where they introduced 2:48the new GPT-5 series. Um, 2:50what they there are three models 2:53that were part of that, which was the core kind of GPT-5 2:55model a mini, a nano. 2:57There are Thinking and Pro modes available across a number of the various 3:01free tiers, with or pricing tiers with different sort of rate limits 3:05associated with them. 3:07And going through the release, 3:09there were three main things that I think stuck out to me 3:12and stuck out to most of the market. 3:15Um, so first of all, 3:17was the introduction of this model router 3:19I think one of the memes on the internet has been 3:21how complicated it has been to like, go into the model selector and ChatGPT 3:24and just like, figure out what model you're supposed to use 3:27for anything that had gotten incredibly complex. 3:29And OpenAI has been talking about needing 3:31to solve that and making that simpler 3:33for months at this point, and consolidating everything 3:37to the one brand family around GPT-5 with a model router 3:40in front of it, was their way of delivering against that. 3:43Um, the second piece was that 3:46there are improvements in benchmarks, so it does look like it's a smarter model. 3:49It is not like Earth shatteringly more intelligent than the other models 3:53that are on the market, and that makes it a little bit different, 3:55I think, from other like model releases of this nature, 4:00is that the improvements in the benchmarks 4:02is maybe not the highlight, but there are improvements there. 4:07But where we're seeing even more improvements is actually in reliability. 4:10And some of the most important benchmarks that they introduced were actually around, 4:14um, reductions in hallucinations. 4:16So the idea that you can actually trust these models 4:18more for work that you're doing day to day, 4:21and then finally, which I should think was a surprising, 4:24uh, in the same way that reliability was almost a surprising theme. 4:27Uh, price, I think, was a very surprising theme. 4:29Um, as part of this, like typically 4:31when you think about state of the art models, 4:33the thing that you're thinking about or state of the art technology, 4:36you typically associate that with pricing power. 4:38But actually one of the big takeaways, for me at least, was actually accessibility 4:41And the reaction in the market, particularly when we're looking at the API pricing, um, 4:45these models, all three of them are very competitively priced 4:48and, you know, is in some ways 4:50starting to deliver on that, like "too cheap to meter". Um, 4:53slogan that I think we've heard. 4:55So those were some of the big highlights. 4:57then also this consolidates 4:59many of the existing models that exist, 5:01starting with ChatGPT, um, and kind of the UI. 5:04And I think over time that will come 5:06more will come to the API as well. 5:08Um, but that's kind of a quick summary of the release. 5:11Maybe, um, just to start with other general reactions, 5:14we'll get specifically into the coding piece here in a minute. 5:17But, um, maybe I'll start with Mihai 5:20beyond just the kind of like the quick highlights. 5:22I'm curious, like, what else stuck out to you about this release? What's important? 5:26Um, and just any kind of general reactions you have. 5:29I think what stuck with me this release is 5:31how good this model seems to be at tool invocation and calling for AI agents. 5:35I think you can really see the results of fine tuning 5:38to specific things like MCP and tool invocation 5:40and function calling and structured outputs, 5:42and I think it's a lot more reliable than previous models we've used, 5:46at least compared to, you know, GPT-4 5:48or even some of their previous reasoning models. 5:51and it does so at a cost, which seems sustainable 5:55for using for this type of, I would say, a genetic workload. 5:58I think I'm tempted to go into the ChatGPT interface 6:03and talk about that for a second. 6:04But actually, if I think about the APIs for a second, 6:08I kind of like what they've done there. Right. 6:11So back to your point We want to talk about big end models. 6:14But those smaller models are killer to uh, to me his point. Right. 6:19So especially the nano model, 6:22that little nano model on the API 6:24outperforms most of the large models in the market, 6:28and especially for agenetic, 6:31as Mihai was saying, a function calling it just gets it right. 6:34So most of the time you'd be like, 6:36oh, I'm going to go to a mini model, 6:38or I'm going to go to the the full fat model. 6:40I don't think you I don't know if you want to call that. 6:42Should we size things like Coca-Cola? 6:44But um, but you know, but um, but diet GPT-5 6:48is just killing it out. 6:51And GPT-5 like they are just absolutely killing it. 6:53So no, I'm I'm really impressed with the 6:56the nano model and the, the other one. 6:58And maybe I will do the demo a little bit later. 7:00is the browser 7:03control is really good and it's logic 7:05and reasoning back to the sort of ChatGPT interface. 7:08So one of the, one of the things I like to do with models 7:10is taunt them with games. 7:12It's, uh, it's one of my fun things. 7:13And, and definitely 7:16none of the earlier versions of GPT 7:18was ever able to solve the model game. 7:21Uh, I don't know if you've played Murdle before. 7:23It's, um, it's kind of like you're is this detective guy, 7:27and you've got to figure out who killed the murder. Uh, 7:29who who killed the person who's a murderer? 7:31With what weapon and what location? Um, 7:34and and it never worked. 7:37It never got it right. It was always getting things wrong. 7:39And so today, I sort of played 7:41it against the Murdle game on the agent browser. 7:44It took 20 minutes to play the game, but it got it. 7:48It solved the murder. 7:49And no earlier version was able to do that. 7:52So I think they've really focused 7:54on the planning, the logic, the reasoning. 7:57So there's been a huge emphasis on that as well. 8:00And I and I think I appreciate that 8:02the, um, it does have the ability to cheat. 8:06The second version was great. 8:08The first time it played Murdle, it played for ten minutes 8:11and then it looked up the answer on the internet. 8:13So my second prompt was like, don't look up the answers, don't cheat. 8:17But, um, but, you know, fabulous, fabulous model. 8:20I'm actually I'm glad you mentioned that 8:2320 minute time span, because one of the other things, 8:26um, that I just saw 8:28people most encouraged about on the internet, 8:30was there some of these charts that just show, 8:33uh, just the consistent curve of being able to 8:35successfully complete longer time horizon, um, tasks? 8:39And that is, you know, I don't want to 8:41I don't know if I want to lump that 8:42into the same sort of space as reliability, 8:44because it does feel like a little bit adjacent to that. 8:47But, uh, that was another aspect 8:49I think people were, were pretty excited about. 8:52Um, I'm curious if you got like, if this is what you both, like, 8:55expected from this release, it gets kind of into the realm of predictions. 8:59But I think one of the interesting things, just gauging the market's response to 9:02this is either the night before the release or two days before that. 9:06Sam Altman posted the Death Star on Twitter, 9:10which I think like sent everyone 9:12into, kind of like a little bit of a frenzy, 9:14like their opening eyes, kind of known for the sort of vague 9:16posting thing that they do on Twitter where they, 9:19you know, really hype these, these model releases 9:21And then what's interesting, it kind of came out and it's just this 9:24like very, um, 9:26strong focus on just like straight utility, um, in a lot of ways. 9:30And, um, there had been some early kind of rumors and reports about like 9:34that the focus was going to be on things like hallucinations, 9:36but like when you think about what you were looking or 9:38expecting OpenAI to do around GPT-5, 9:41is this kind of like in line with what you were expecting in terms 9:44of like the trajectory in this space, or was this in any way 9:46kind of surprising in terms of where they ended up focusing? 9:49I didn't find it very surprising just because they've released gpt-oss before, 9:53and I've been playing around with that, and this feels very much in line 9:56with the same kind of design, the same kind of style. 9:58And I expected more of a GPT-5 10:01kind of a release 10:03just because it was time. Also, 10:05from my own personal experience 10:07of using some of the old free models or Free Pro, 10:09I didn't find them to be that useful for general purpose tasks. 10:13I found that they took way too long to accomplish a task. 10:16They were prone to overthinking. 10:17They were prone to coming in with very strange formatting 10:20and even formatting issues. 10:22And the way I'm seeing at least this model releases 10:25as a fix to that with a unified architecture, 10:28that kind of gives you, again, the core capability, 10:30that wow moment we all had when we first picked up GPT-4 10:34Oh, yeah, I, I sort of agree with you, 10:38but I think maybe we're a little too To use to playing with different models. 10:44Anyhow, in my opinion, right where 10:46we'll soon play with Gemini, we'll play with big models, 10:49small models, o3, o4 minis, but the pro versions 10:52I will play with Claude, etc. 10:54you know, we switch between a lot of models. 10:57I think if you're not in that world, 11:01I think this is going to feel like an incredible model, right? 11:04Because, you know, let's be honest. 11:07I mean, we'll come back to coding for a second, 11:09you know, the front end capability. 11:11So they've got the ability to create good user 11:14interfaces with react code 11:15is significantly better than the earlier versions. 11:19Now we would argue and say, well, actually 11:22Claude has been doing that. Fine. 11:24But the reality is, 11:26you know, the GPT models 11:28didn't generate generate good user interfaces, etc. 11:31they weren't great generating end to end applications. 11:34And I think there's been a huge focus on that. 11:36So I, I, I think there's probably less than a surprise there 11:41But but if you can imagine the average user 11:44for a second, you know this is all. 11:46That's your point. It's super cheap. 11:47It's like the $200 version. 11:49You can just generate whatever you want. 11:51This is I think this is a game changer 11:53for most people. 11:55We're probably just being a little critical in that sense, 11:59but I but I think there is probably a few other things there 12:03that, you know, to compare that to the gpt-oss model. 12:06I mean, the reality is this one is multimodal. 12:09It is doing audio, it is doing images, etc. 12:13we're not selecting different models. The agent. 12:16I mean, come back to the agenda capabilities. 12:19The agenda capabilities are great. 12:22And then to your point, I mean coming back to the APIs. 12:25So some of the things I appreciate in the back 12:28end is how they've handled the grammar and stuff. 12:31So I can actually start providing my own grammar for function goals 12:34and be able to guide the, the, the structure that I want back. 12:37So I actually think some of the things 12:39that they've done there solves a lot of our kind of pain points. 12:42So, um, would I have wanted the 12:45the model where I just say, I am thinking this, I'm going to sleep 12:48in 20 minutes later, you're going to have created the world. 12:51Of course, we all want that. 12:53But. But is that a reality? 12:55Probably not. 12:56But but does just does this level up everybody. Right. 13:00Remember 700 million users on this 13:03compared to the number of people on Claude. 13:04It's a it's a huge level up. 13:06So I think I think we're just maybe tuning in the weeds. 13:10It's funny because when I looked at 13:13there's a few things on Twitter where people were having 13:15just like what felt like small séances, saying goodbye 13:18to some of the old models that they no longer get to talk to anymore. 13:22there was like a feeling of a little lull, 13:23but it's like it is such a specific community that feels that sort of way. 13:27And like 99% of the world is just totally overwhelmed by all that complexity. 13:33So I think, you know, delivering something that, um, is much more 13:37accessible, accessible to the masses. 13:40Um, makes a ton of sense. 13:46I'm just still thinking 13:47about kind of the dichotomy and reactions in in the market 13:50where, um, your point of just like, I would love 13:53to tell the model to do something 13:54and wake up 20 minutes later and like, you know, it's 13:56created a world changing sort of application. 13:58One of the one of the reactions I saw in the market 14:01is that this kind of confirmed that people believed 14:04that the march to the intelligence explosion, AGI, ASI, 14:08whatever you want to say 14:10is going to be more of a slog. 14:12It's not going to be you're not just going to wake up one day 14:14and there's going to be, you know, we're magically there. 14:16And it could also be the case that, um, 14:19you know, it's actually not going to be maybe just one giant, uh, 14:23model to rule them all. 14:25But like when I think about this, compared to like, Genie 3, 14:27it's like those feel like pretty different, um, you know, 14:30approaches even I think the discussion will get to around coding. 14:32Like, it still feels like there's plenty of room for for Claude in this. 14:36And so, you know, maybe it's the last question before 14:38you just dive into the code is like, does this did this 14:40release update any of your thinking at all 14:43around the trajectory to some of like the 14:46the bigger topics in the industry around 14:48AGI, ASI, intelligence explosion? Did it? 14:52You know, some people seem to think it was 14:53a little bit of a wet blanket, um, 14:55on that other people felt like we were kind of right on target. 14:58But, you know, we're I'm curious where you guys 15:00come down on this and maybe start with Chris. 15:02I think I think actually 15:05probably I know this is going to sound odd, 15:08but I think the gpt-oss model probably 15:11hit me more with that 15:13because we got to see some of the underlying architecture more, 15:16and then I can kind of project forward what's going on. 15:19And, uh, you know, in the larger models, 15:23I think there is a few things that are as key. 15:26So the first one is, 15:28I know I say two words, agents all at a time, 15:30but I really think agents is the big way through this. 15:34And one of the things that I noticed on the gpt-oss models 15:37is that any analysis channel. 15:40So if you think of the thinking mode for a second, then and the 15:44and the response API one is doing thinking the, 15:49you know, the head of a basically a reasoning, 15:52you know, the analysis channel is the first token. 15:55And then basically all the tokens go into the analysis channel. 15:58That's where it does the thinking. And then it will create a new token. 16:01And then to say this is the final response. 16:02So that's kind of what happens in the back 16:04end from a kind of a next token prediction. Right. 16:06So all of that thinking happens in this channel. 16:09But but if you look at some of the system prompts and the gpt-oss um, 16:13code base, you will see things 16:16like when you're in the analysis channel, um, 16:18if you're going to do math or things like that, 16:21go use your Python tool, 16:23but don't tell the user, right. 16:25Just use just do the calculation 16:28and the thinking, but you don't need to tell us about it. 16:30what we're seeing here is tool use. 16:34So there's two types of tool use that's going on. 16:37There is tool use that you are going to say here's my function call. 16:40Go call this thing on the outside. 16:42But there's a set of tools that are going to be used 16:44by the models themselves to get around things like, um, 16:48uh, you know, being able to do math calls, etc.. 16:51So there there's a whole level of experimentation 16:54that I've been going through is like, um, you know, what's this? 16:57Multiply by this, and then I'm like, don't whatever you do, don't, 17:01um, use a tool in the analysis channel 17:03to try and see how much that's doing that. 17:05And I think this sort of embedding of tools into the thinking in the model 17:10to be able to back down a little things 17:13and, uh, you know, and reduce the hallucinations and have more accurate answers 17:16I can see that expanding out. 17:18I can see that expanding out into, 17:20you know, hundreds or thousands of tools in the future. Right. 17:23So I think that's one direction that I'm probably seeing. 17:26Then the other thing is, you know, and it's great that we're on this podcast 17:29is the whole mixture of expert thing. right? 17:31So when we look at the gpt-oss model, 17:35what you're seeing is a lot of expert, right? 17:40So maybe there's four active experts, 17:41but the number of total experts in these models is huge. 17:45It was like 32 experts. 17:46I think the larger one was like 100 odd experts or something like that. 17:50So what they're what they're doing here is just expanding out 17:53the number of experts that are part of this model. 17:56Um, with much, much smaller 17:58number of parameters per expert, which makes sense 18:01because then you can really speed 18:04tokens through the model, because guess what's important to us? 18:06We're not prepared to wait for the model to come back with answers. 18:10And when it's a big model, you're 18:12you're having to wait for it to go 18:13through the layers to churn out those tokens. 18:15And as we as human beings, we're like, 18:17no, no, no, no, no, no, no, I'm not waiting. 18:19So I think this this push towards 18:22much smaller models, which are much more distributed. 18:26And I think that's going to continue to AGI. 18:28And then all I imagine that's probably happening in the GPT-5 18:31era models is they probably. 18:33They're probably still got those smaller partitions there. 18:37But I imagine for some of the hard thinking, 18:39they've just gone to a larger parameters on some of those bigger models. 18:42So I think I think 18:44there's a lot of clues of how we're going to get there. Yeah. 18:46And I think this can also pull the HCI 18:48timeline forward by speeding experimentation, 18:51especially with the gpt-oss release. 18:53I think there's a lot of dimensions in which these models can improve, 18:56not just raw performance. 18:57I think one I'm really thrilled about is the inference cost, 19:00and that gives you the opportunity to make things work in different ways. 19:04So you can just hit hundreds of these requests at different tools 19:08and then summarize those results. 19:10There's also inference speed. 19:12And this is also kind of dictated by hardware, 19:14but also by having smaller models and more efficient models. 19:18If you have inference speed. 19:20And I'm seeing this with the gpt-oss, it's running at 180 19:23tokens on my single GPU, and that's impressive. 19:26You can really hit hundreds of these requests in parallel 19:29and with tool calling improving, as Chris was saying, we're not going to see 19:33hundreds of tools, we're going to see thousands or tens of thousands, 19:36and there's a lot of opportunity for continuous improvement 19:39in just the ecosystem alone and even the user interfaces 19:43Most of the consumers are using. 19:45What was the 700 million are using ChatGPT, a ChatGPT UI. 19:49And one of the reasons I love cloud 19:51is because I find it 19:53UI to be superior 19:55in handling things like artifacts, 19:58handling things like projects 20:00and how it's using it's "canvas" 20:02versus the way ChatGPT was using it. 20:05now this is leading to improvements 20:07in the user interface aspect as well. 20:10And I think we're going to see a lot more substantial 20:13improvements in all these areas 20:15outside of just the role model performance. 20:18They're going to get us closer to that wow AGI moment. 20:21And I think that's actually pretty consistent with some of 20:24I feel like I did there was like I mentioned, there was some 20:26I think, a fair amount of folks who were like, 20:29I don't know, maybe they're like Iggy now. 20:31And this was like, oh, this is not AGI. 20:33But the like all of these underlying 20:36like the, the reliability, the tool calling like all of these feel like prereqs 20:39to actually getting there and then making progress on 20:43um I think those dimensions will end up being like 20:46very obviously a major part of the story. 20:52All right. 20:52For our next segment where as we talk about code, we're actually going to 20:56what if we're one of the first times on the show 20:57actually do some live demos? 20:59Uh, and so if you're listening along on audio and you actually want to see, 21:02um, some of what's going on on the screen, 21:04and I promise you, for at least one of these segments, 21:07you're going to want to see some of the beautiful artwork that's on display. 21:11Head over to the IBM Technology 21:12YouTube channel where you can see some of the stuff live. 21:14You know, with that, I want to get to our last segment, 21:17which is, I think for as impressive and exciting 21:19as some of these announcements are, for as much as I think 21:22it'll make an impact on hundreds of millions of people who use these tools. 21:26of the big themes that's in the blog post, 21:29it's like one of the most talked about, 21:32um, you know, parts of the release. 21:34Uh, basically every forum is, 21:37you know, for as much as OpenAI has done in this space, 21:39like Claude and Anthropic have, you know, 21:42continued to just be the the leader in kind of the coding area. 21:45And there was the question of like, will this be the release 21:47that puts them over the top and, um, and gets them there? 21:51And based on the initial question, 21:53um, it sounded like for both of you, the answer to that, 21:55at least right now, is not quite yet. 21:57And so, you know, maybe, uh, 22:00starting with, with Mihai just on this one. 22:02But you know what? For you, like, how far did it get? 22:05Like, did it get close and, like, why didn't it get all the way 22:08What are still the difference makers? 22:10You said you're like, okay, time to go back to my day job. 22:12I'm going back to you. 22:14You know, Claude, at this point, you know why? 22:16What were the big differences for you that you still feel like, 22:19you know, Anthropic has an advantage in this space? 22:22I think first, I want to define how I typically work with these models. 22:26I don't work with just one model. 22:27I actually use them at the same time in parallel. 22:30One cloud is busy doing something useful for me. 22:33I might fire up. 22:35I would say ChatGPT, or I might up 22:37some other model or Gemini to do some deep research, and I kind of 22:41have them all working in parallel while one is busy doing something. 22:45Um, but I give them different tasks. 22:47And I was hoping that for my typical day to day workflow 22:50where I'm using things like cloud code, for example, 22:52or projects in cloud with Opus 22:55to give it hard problems to solve, to ask 22:57for unit test case creation 22:59in a way which is consistent with my code base, 23:02that the new models and the new experience I have with ChatGPT 23:05would be able to pick that that up seamlessly. 23:08And I still find that, at least for me, um. 23:12ChatGPT struggles while Opus is still able to deliver those use cases. 23:16So let me jump into a very quick screenshot 23:19to show one of these workflows. Uh, 23:21live. 23:22So here you'll see that I've got four windows side by side, 23:26which kind of replicates my real working environment. I've got different things. 23:29I've got, uh, continue and I've got client, but I use multiple models. 23:33So here you can see, for example, I've got gpt-oss running on my machine. 23:37It's thought for 25 seconds and it came back with a complex mermaid diagram. 23:42I'm going to pick it up. 23:43I'm going to paste it in mermaid. 23:45As you can see, it actually worked. 23:47First one shot gave me the results I need. 23:50I'm going to do the same thing live with GPT-4. 23:52And I've tried the same thing with Claude. 23:54As you seen Claude before, Claude gave me again. 23:57First time a great diagram. 23:59I'm getting similar things from even the oss model 24:02with GPT-5. 24:05Uh, well, let me try that again. 24:07It's actually not answering. 24:09Maybe got slightly intimidated. 24:12I tried on your chat again. 24:14Let's create a complex diagram. 24:16Yeah, I think I've intimidated it with with a benchmark. 24:20But the experience I've had is that I have to try 24:22maybe two, 2 or 3 times. 24:24Take the error I get from the mermaid renderer, 24:27give it back to the ChatGPT 24:30UI and it'll try again and again before I get one of those things. 24:33And with Claude Code, and Claude seems to handle those things 24:37a lot more gracefully behind the scenes, 24:39and is able to give me that experience the first time 24:42and is able to deal with, for example, continuing large codebases 24:46in a way that ChatGPT does not, 24:48or in a way that OpenAI models don't seem to. 24:50At this point, again, the difference is minuscule. 24:52But if you're working with very large code bases, 24:55or if your code files exceeds those, you know, 24:58a thousand lines of code where models tend to struggle. 25:01I find Opus still to be the superior model. 25:04Chris your thoughts? I would agree. 25:06In fact, I'm going to jump to my demo, which is way more enterprise than, uh, 25:11than nice one. 25:13And but I think it will help. 25:15Um, I think it will help 25:17sort of back up on my I sang in this sense, 25:19so I am going to apologize right now for what 25:23I am going to going to show our wonderful audience. 25:27So I'm going to share my screen, 25:29and here is my best test in the world. 25:34So I it's my test for everything 25:37where I like to create donkeys. 25:40And what I want the donkey to do 25:42is we have cut donkey vowels on the mixture of experts 25:45podcast and their donkeys. 25:48My donkey has to blink in time 25:51with an internet clock for a second and actually, 25:54hey, it it blinks. It blinks. 25:57It should blink in time with the clock. 25:59Now, don't get me wrong. 26:01And it should be internet synced and then fall back where it 26:04what I actually really appreciate about this. 26:06I mean, this is the code that generated 26:08I think the code is pretty good actually. 26:09And this is, this is this is probably, 26:12um, quite a bit of a change from before 26:15where it was used to generate terrible code. 26:17But but actually, I think the code is pretty, pretty good. 26:20But let's let's just run that one more time, you know? 26:22Um, but this is quite nice in their canvas 26:25area that it will go off to the internet, 26:27whereas Claude etc. doesn't. 26:29Now or, you know, we're all feeling good at the moment. 26:32We've got my blinking donkey. Um, 26:34but I'm going to show you the problems 26:35in a second before we do this. 26:36This is this is Claude, uh Sonnet, in a sense. 26:39So it's it's blinking. Donkey. We. 26:41I don't know what's going on with the tail 26:43there, to be honest, but, um. 26:45But it's fine. It's blinking, etc.. Um, 26:48this is Opus. 26:49I quite like this. 26:51I, I still don't know what's going on there, but this is Opus 4.1. Uh, 26:54so, you know, we're we're probably feeling, uh, 26:57we're probably feeling pretty good now. 27:00Now, to show the frustration, I, I 27:03think this highlights the difference here. 27:06So in this case, I decided 27:08I was going to go to GPT-5 Pro. 27:10So I wanted to get the best donkey that I could possibly get. 27:13It thought about it for seven minutes. 27:15Seven seconds. Um, 27:17but. 27:18And there's all the plan, etc. associated with it. 27:21the problem is it didn't put it on the canvas, 27:25so that is not what happened there. 27:27It said save this as a donkey HTML. 27:29So even though I'm using Pro at this point, 27:31it refused to put it on a canvas. 27:33Now look at the size of this text. It's quite big right? 27:36So we're all we're all feeling pretty good. 27:37This is going to be a quality donkey. 27:39That's our donkey. 27:40Um, and it's even got an ear wiggle in there. 27:43So, uh, it does that every minute. 27:44Um, and I and I said, put it in the canvas, 27:48and I asked pro to do it, right. 27:50I just put this in the canvas 27:54and it took eight minutes, and then it went. You got it. 27:57Here's a pure canvas version, and you're like, huh? 28:01That's not what I wanted 28:03I wanted it over here. 28:04And so it didn't help. 28:06So I then switched it back to ChatGPT, uh, regular. 28:11So I changed the model to ChatGPT-5 there. 28:14And I said, put it in the canvas. And I went. 28:16I put your blinking donkey into canvas, blah blah blah. 28:19Um, and if I were to show you 28:22the version that it created, um, 28:24it's down here somewhere. 28:26Um, let's see if we can find the code. 28:29You can see it's like it was like, 28:32uh, it was a tiny. Yeah. 28:35Here we go. Best on canvas. 28:37It was like, uh, that's an earlier version. Uh, 28:40anyway, I can't find it's early version. 28:41It was. It was like a 10th of the code. 28:44So you will see, I then came back and said, no, no, 28:48no, no. Do it properly. See, 28:50I said the words don't admit. 28:53Because actually what happened in that particular version. 28:56It just went. And here's the rest of the donkey. 28:58Do you know what I mean? I was like, no, 29:00I don't want the rest of the donkey. 29:02You know, it started. 29:04It's the old trick, the problem that you have with ChatGPT. 29:07It just starts cutting things out and says, rest of your code goes here. 29:11So I wrote down and met and then I, 29:13and I said, here's the full version, 29:16you know, give me this or it's ready to run. 29:18And then eventually. 29:20So I'm copy and paste code back in. 29:21And then eventually, uh, it gave me this version which run there. 29:26But you can see that took 25, 30 minutes 29:32to produce this donkey. 29:33Um, and most of the time me was, is me 29:36going, don't admit, put it in the canvas over here, etc.. 29:39Now, that's probably not the best workflow 29:43in the world in that sense, but I'm trying to prove a point. 29:45And the point is that Claude 29:47just gets it with your artifacts, right? 29:50It will just it will remember what you did. 29:52It will update it, and it will just work. 29:55And and then it won't emit code. 29:57But ChatGPT still got this code of mission. 30:00Now, this is not a model problem. 30:01I think this is more of a user experience problem, 30:04but it's still and it's probably a kind of a cost optimization problem, 30:09but it's still enough frustrating for me 30:12that that if it can't keep track of the artifacts 30:15and it does the same kind of in APIs as well, 30:18then it's just going to push me back into cloud because I, 30:22I don't want to sort of skipping stuff 30:24so that that for me was the, the, the major frustration behind the model. 30:28But but code wise and capability wise, they are very, very close 30:32strong world model around donkeys. 30:36So for our audio only listeners out there, 30:39I will say that the last donkey we were looking at 30:42was esthetically, from my perspective, the best donkey in there. 30:47So I was impressed with that. 30:48Um, but perhaps and even the code, as you were mentioning, 30:52I think solid code, but like, not quite there, 30:54just from a workflow. 30:57Ease of use, straight developer productivity. 30:59Um, you know, sort of, um, sort of vibe. 31:02And, you know, I think based on what I saw, like 31:05there was a lot of that sort of reaction on the internet, 31:07which was like, this is really impressive. 31:10I like what they're doing a lot. 31:12Still gotta use cloud. 31:13Probably, uh, on this end. 31:15So, um, but I do think it's like, it's always great 31:18that there's a ton of competition and innovation in this space, 31:21because it means we'll just continue to get, you know, 31:24better and better things, um, to work with. Go ahead. 31:26Chris, looks like but to your point, the front end code 31:29and the experience and the gloss I think is better now. 31:32It is definitely a better in GPT-5. 31:34They just need to sort out that developer experience. 31:37And and the token optimization they're there just to 31:40ah stop doing this I pay my 200 bucks. 31:42I want to I want to cancel my cloud, uh, 31:45subscription, I want to I want to be 31:47I don't want to be paying 280 bucks. 31:49I want to come down to the 200.. 31:50So just do that and save me some money. 31:53We all just need. We're all going broke. 31:55Uh, having to pay for ten of these things at the same time. 31:58So, um, Chris, Mihai, 32:01thank you for joining us today. 32:03Any final thoughts before we 32:05we let the audience go and send them on their way to, 32:08you know, wrestle with ten models simultaneously 32:11and figure out which one they like For what. 32:13Go ahead and try these things, especially the gpt-oss. 32:15I'm still I'm still there. 32:17Even with with all these new fancy models, 32:19I'm still passionate about going off 32:21and playing with a gpt-oss model, running it on my machine, 32:25being able to use it in a generic workflows, 32:27and being able to use it in a combination where I might use GPT-5 32:30for my orchestration, I might use cloud for some code specific tasks, 32:34and even gpt-oss for tasks where it can perform reasonably well for its size. 32:39I think I'm gonna experiment. 32:41Yeah, I would agree. 32:43Go experiment. 32:45And then I would say go play with the agents. Honestly, 32:49I think the web browsing capabilities now are incredible. 32:54I really think the tool calling capabilities are incredible. 32:57So go, go play with that. 32:58And and I think that's where it's outshining everything else. 33:01And I will maybe also just close with a little shameless plug 33:05for some of the work that we're doing in watsonx, 33:07so you know if you're interested. 33:08We've obviously always been a big supporter 33:11of the open source, open source AI space. 33:13If you'd like to use some of the gpt-oss' capabilities, 33:17we have those in watsonx today. Um, 33:19we've also with some of the new work 33:21that we've been doing around model gateways. 33:22We're actually making it easier and easier to bring frontier models 33:25and API keys that you have to our platform., 33:27you know, obviously go 33:31try these tools, use them, and, you know, 33:33check out some of the ways that hopefully we're making it easier 33:35for to consume them, um, and some of the stuff that we're doing. 33:37So I'll just say again, Chris, Mihai, thank you for joining today. 33:41To our audience. Thank you for listening. 33:43Um, been another exciting week in AI. 33:45And make sure as the token podcast 33:48line goes, make sure to like and subscribe. 33:50Um, if you're a fan of the pod and we will see 33:52you next time. Thanks everyone.