Learning Library

← Back to Library

03 Pro Beats Other AI Advisors

Key Points

  • The speaker evaluated several top AI models (Gemini 2.5 Pro, Claude 4, 03) and found that only 03 Pro consistently delivered insights that felt “resonant” and personally relevant.
  • In three benchmark tests—critiquing the Apple “illusion” paper, drafting a Datadog roadmap, and optimizing a Wordle algorithm—03 Pro outperformed the baseline 03 and other models, even when its answers were shorter or less exhaustive.
  • 03 Pro’s edge came from its ability to recognize tool‑calling limits and deliberately stop or clarify rather than hallucinate data, which produced more trustworthy and actionable results.
  • Although not perfect, the speaker argues that 03 Pro is the first model capable of acting as a strategic advisor at the founder level without major caveats, highlighting its rapid development within just 48 days of the original 03 launch.
  • This progress signals a shift from AI being merely tactical to becoming a genuine strategic partner for complex, multi‑dimensional problems.

Full Transcript

# 03 Pro Beats Other AI Advisors **Source:** [https://www.youtube.com/watch?v=5kWuXbiQ2zY](https://www.youtube.com/watch?v=5kWuXbiQ2zY) **Duration:** 00:14:28 ## Summary - The speaker evaluated several top AI models (Gemini 2.5 Pro, Claude 4, 03) and found that only 03 Pro consistently delivered insights that felt “resonant” and personally relevant. - In three benchmark tests—critiquing the Apple “illusion” paper, drafting a Datadog roadmap, and optimizing a Wordle algorithm—03 Pro outperformed the baseline 03 and other models, even when its answers were shorter or less exhaustive. - 03 Pro’s edge came from its ability to recognize tool‑calling limits and deliberately stop or clarify rather than hallucinate data, which produced more trustworthy and actionable results. - Although not perfect, the speaker argues that 03 Pro is the first model capable of acting as a strategic advisor at the founder level without major caveats, highlighting its rapid development within just 48 days of the original 03 launch. - This progress signals a shift from AI being merely tactical to becoming a genuine strategic partner for complex, multi‑dimensional problems. ## Sections - [00:00:00](https://www.youtube.com/watch?v=5kWuXbiQ2zY&t=0s) **03 Pro Outshines Rival AI Models** - The speaker compared 03 Pro to Gemini 2.5 Pro, Claude 4, and others across three assessments—an Apple paper review, a Datadog roadmap, and a Wordle optimization—and found 03 Pro consistently provided more insightful, resonant answers, not merely longer ones. - [00:03:28](https://www.youtube.com/watch?v=5kWuXbiQ2zY&t=208s) **Leveraging 03 Pro for Complex Strategy** - The speaker highlights 03 Pro’s rapid advancement and its ability to deliver deep, strategic insights on heavyweight, context‑rich problems—provided users feed it extensive background, set clear constraints, and let it autonomously gather needed information. - [00:06:36](https://www.youtube.com/watch?v=5kWuXbiQ2zY&t=396s) **03 Pro: The Ultimate AI Ferrari** - The speaker praises 03 Pro as the current top AI model, likening its power to a Ferrari while warning that it requires careful prompting and may struggle with simple tasks like document summarization. - [00:10:23](https://www.youtube.com/watch?v=5kWuXbiQ2zY&t=623s) **Executive Strategy Model Insights** - The speaker explains how advanced AI can generate high‑level strategic plans but still requires careful prompting and cannot fully replace human nuance, context, or the desire for external expertise. - [00:14:00](https://www.youtube.com/watch?v=5kWuXbiQ2zY&t=840s) **Seeking AI Insight for Happiness** - The speaker asks the AI what could make them 50% happier and recommends using a more advanced “03 Pro” model for deeper, more useful advice. ## Full Transcript
0:0003 Pro is out. I've been testing AI 0:03models for years now. They're helpful. 0:05They're tactical. They've recently 0:07become strategic. I'm looking at you, 0:09Gemini 2.5 Pro, Claude 4, 03, 0:14but they have not yet been resonant. 0:18What I mean by that is they haven't yet 0:20been so on the money consistently with 0:23their perspective that their words stick 0:26in my head and just live rentree. 0:30That is what we are getting to with 03 0:32Pro. I don't just mean they're good 0:35writers. What I mean by that is they're 0:38so insightful that I feel like I am 0:41profoundly known in the problems I am 0:44grappling with. 0:47So when I started to dig into 03 Pro, I 0:49wanted to give it, you know, an honest 0:52test. I wanted to give it something that 0:54would give me a sense of how it actually 0:58works. And so I picked three things 1:01where I felt like I could make an 1:03assessment. 1:04One was an assessment of that infamous 1:08Apple paper and I wanted to stack it up 1:10against 03's assessment. 1:13One was a road map that I would share 1:16with the seauite and I wanted to pick a 1:18company I knew reasonably well. I picked 1:19data dog 1:22and one was a interesting algorithm 1:27optimization problem and I picked Wordle 1:31optimization. 1:33Now that's a fairly easy one if if like 1:35you've done optimization problems but I 1:37wanted to see like what it would do and 1:38how it would write it relative to what 1:4003 would do. 1:43I looked at all three. 1:45In every single case, 03 Pro did better. 1:51And the reason why it did better 1:52surprised me. It was not that it was a 1:56longer answer. It was not even 1:58necessarily that it had all of the 2:00sections or was more complete. In fact, 2:03in one case, it was less complete than 2:0703 and still won anyway. Do you know 2:10why? because 03 went beyond its tool 2:15calling capabilities and 03 Pro knew 2:17when to stop and could explain why that 2:20is a huge deal. This was a case where it 2:22was looking at Twitter mentions for the 2:27Apple thinking is an illusion paper and 2:30it was looking at a very specific 2:32criteria around retweets with a certain 2:34number of likes and it said I can't get 2:36that out of my tool call right now. I'm 2:39just going to not mention it. Also, I'm 2:41near the word limit you imposed in the 2:43prompt, so it's not worth me going 2:45after. Both were correct. 2:4803 phoned something in in a table, and 2:52it looked plausible. It named real 2:55Twitter users who had really talked 2:56about the paper, 2:58but the table itself wasn't useful 3:01because it didn't specifically refer to 3:02the tweets because underneath the hood, 3:0403 couldn't get to them. Now I am not 3:08here to tell you that this is a perfect 3:09model. I do think it is the first model 3:13that can operate as a strategic advisor 3:16at the founder level without any 3:18caveats. Does that mean that I think 3:21it's the best founder advisor in the 3:23world? Didn't say that did I? 3:27But the fact that we're even talking 3:28about that 48 days after 03 itself 3:31launched is a big deal. That is how fast 3:34project pro progress is going right now. 3:3703 Pro is able to strategically 3:42understand very difficult 3:44multi-dimensional heavy context problems 3:48and come out with strategic insights 3:50that are correct and act as a sparring 3:52partner. This is a model that is hungry 3:55for context. I have made the mistake 3:57even in the little bit of time I've been 3:59using it of feeding it prompts where the 4:02context was too light. This is a model 4:04that seeks to understand like global 4:07thinking. It wants to think big. It will 4:10go get context. And if it goes and gets 4:13context that you didn't direct it to go 4:16and get, you're going to be surprised 4:18perhaps unpleasantly at what you find. 4:21And so my advice to you, which I've seen 4:24elsewhere around the web as people have 4:25played with 03 Pro, is that you should 4:28use this model for hard problems that 4:31you can give a lot of context to the 4:33model on. That is what it shines at. If 4:36you have a truly strategic conundrum, 4:38something you're wrestling with, you 4:41should be able to come up with a lot of 4:42context either from your own head or 4:45from the web. And you should be able to 4:47feed it to the model, tell the model 4:49where to go, give the model constraints 4:50and warnings, and set the model up for a 4:53really successful hard think. And I mean 4:55like a 15, 20 minute think. This is a 4:57get a sandwich while you wait kind of 5:00model experience. 5:03And I got used to that with 01 Pro. But 5:06the difference with 01 Pro is that 01 5:09Pro felt like a complete essayist. it 5:13would come back with a very well written 5:16response, but 03 Pro comes back with the 5:21strategic insight 5:24that actually underlies that response. 5:27And is it more readable than 03? It 5:29actually is. One of the things I've 5:30noticed in the roughly 6 weeks I've been 5:33using 03 5:35is that 03 is extremely technically 5:39intelligent 5:40and has real trouble dumbing that down 5:44into writing that is clear for 5:45non-technical audiences. And I say 5:48dumbing that down because I think 03 5:50thinks of it that way. 03 has trouble 5:54simplifying pros into plain English a 5:56fair bit. 5:5803 Pro is much better at it. If you ask 6:00it for a plain English summary of a very 6:03technical topic, you are likely to get a 6:06better result out of 03 Pro. Now, that 6:09is not a full measure of intelligence. 6:11There are other models out there that do 6:13that very well. Sonnet 4 is a phenomenal 6:16writer. It just is. I've been playing 6:18with it a bit and have been struck by 6:19how Opus and Sonnet really have a good 6:22onetwo punch when it comes to thinking 6:24about hard problems with Opus and then 6:26writing well with Sonnet. And I guess 6:28that's as good a bridge as any to 6:31talking about model comparison. 6:33This is unquestionably 03 Pro is 6:36unquestionably a model in a class of its 6:39own. I get asked a lot, is X the best 6:45model in the world? Then people will 6:46throw out a name like Gro 3, Opus 4, 03, 6:51uh, Deep Seek, whatever it is. Gemini 6:542.5 Pro. 6:57I I feel very good after playing with 6:59this, telling you that 03 Pro is 7:02unquestionably 7:04the biggest and best model on the planet 7:06right now, and it's not close. However, 7:09I do not think a lot of people will 7:12understand or appreciate it. Partly 7:14because they're releasing it only on the 7:16Pro and the team's plans. 7:19And I think they'll bring it down 7:20because the unit economics seem to be 7:22much more favorable with 03 Pro. They 7:24released 03 Pro for 87% less than 01 7:28Pro. 7:30But even if they bring it down into the 7:32lower tiers, this is still a model that 7:37takes prompting carefully. You need to 7:40be thoughtful about the problems you 7:43hand this model. It's like driving a 7:45Ferrari. If you drive it well, it's 7:48going to do a phenomenal job on amazing 7:52roads and you'll have a great time. If 7:54you take it to the grocery store, you'll 7:56regret it. And if you drive it on bad 7:58roads, you'll just blow it up. And I 8:01will say there are ways you can make 8:03this model quote unquote blow up. I 8:05don't mean actually malfunction, but I 8:07mean I have found that when I have just 8:10attached a document and asked it to 8:12summarize the document, it doesn't do a 8:15super great job at that because it is 8:17unable to restrain itself from being a 8:19global thinker and bringing in extra 8:21context. 8:23And by the way, people are probably 8:27going to call that habit hallucinations. 8:30And I think that is probably incorrect. 8:34And I'll explain why. Hallucinations, if 8:37we look at them, the way we name weeds 8:38and gardens, weeds is just an undesired 8:40plant. A hallucination is just an 8:42undesired thought from a model, right? 8:44Whatever you want to call it. In this 8:46case, I think it's actually very 8:47intentional on OpenAI's part to launch a 8:51model that is a true global thinker 8:53because they need that on the path to 8:55AGI. So that part makes sense. 8:59I think the challenge is 9:02because it gathers context from across 9:04the web and it is difficult to 9:07understand what all the sources were 9:09that it got a hold of. 9:11It is hard to know at first glance 9:14whether the numbers it is giving you and 9:16the facts it is giving you in a response 9:19are absolutely correct every single one 9:22of them or whether some of them might be 9:25made up. And it is so persuasive 9:29and so clean in its pros and so 9:31insightful 9:33you won't get the feeling intuitively 9:36that the numbers are made up. they won't 9:39sit out to you and like jump out and 9:41say, "Oh, this is a madeup number." the 9:42way they have in the past. You will have 9:45to do your checking. And so, I would 9:47think this is the first model where it 9:49is probably going to end up being 9:51malpractice 9:53not to check the model's response with 9:57another model before publication. If 10:00you're going to go to an executive, if 10:02you're going to go to the internet with 10:03a model's output, it is on you to use 10:06another model to help you check because 10:09the number of things that is looking at 10:12is kind of too high at this point for a 10:15human to fact check individually unless 10:17you have hours and hours and hours and 10:20hours. 10:22And part of what we're using these 10:23models for is that they save us time. 10:25Like this is better than McKenzie 10:29strategy decks I've seen. Like this is 10:32truly a 10:35executive level strategic thinking 10:37model. 10:39And so I guess if you bring that back 10:41around to where I started, you have an 10:43executive strategic model in your 10:45pocket. 10:46It is picky about prompting. It is a 10:49global thinker. You have to expect it to 10:51be one. 10:53What are you going to do with it? How 10:55are you going to prompt it? What are the 10:57problems that you were going to give it? 10:59And I deliberately want to point out 11:01that I do not think that just because 11:03this model is capable of this level of 11:07strategic thinking, that does not mean 11:10everyone is going to go out, use this 11:12model for strategic thinking and make 11:14all consultants go away. Partly because 11:17no matter how good this model is, it is 11:20not going to be able to understand the 11:23hidden depths of quiet context, the vibe 11:26in your office the way you do and the 11:29way a consultant does if they really sit 11:31down with you. And honestly, partly 11:33because people are kind of lazy and 11:36don't always 11:38actually use the capabilities that are 11:40in front of them. 11:42And so imagine this as like an 11:45incredibly powerful home cooking machine 11:48that has magically arrived in all of our 11:50homes or soon will. 11:54We're still going to go out to eat at 11:55restaurants. We still want to order in 11:58shrimp lain or pokey or sushi sometimes, 12:03even if the magical cooking machine can 12:05do a phenomenal job because we're human. 12:08And that's actually a point that Sam 12:09Alman made. He intentionally published 12:11an essay today called the gentle 12:13singularity where he talked about the 12:14fact that a lot of what humans care 12:16about is going to continue to exist in 12:18the 2030s. And yet at the same time this 12:21takeoff into intelligence is going to 12:24continue to happen. And his thesis is 12:28essentially that we will have much more 12:30abundance etc because we let the 12:32intelligence happen. We will see how all 12:35of that plays out. Sam has the ability 12:37to actually drive some of that. you and 12:39I don't uh we're just along for the ride 12:42and it's helpful to understand what's 12:43going on. In my view, there's two big 12:46things that stand out that I want you to 12:48take away. One, it's been 48 days. I 12:50said it before, it's been 48 days since 12:5203. We are going fast. 04 is around the 12:55corner. 04 Pro is coming. GPT5 is 12:57coming. That is all from one model maker 13:00this year alone. Plus the open source 13:03model they're going to release. 13:05There's other model makers right around 13:07the corner. I'm sure they're working out 13:08late tonight on 03 Pro, 13:12so things are going fast. Number two, 13:17yes, this is a model that is worth 13:20getting to know. It is the best model in 13:22the world. It needs an excellent, 13:24excellent problem. I am fully like 13:27having talked to a lot of people in 13:29tech, outside of tech, we all grapple 13:32with problems that are really, really 13:34tough for us. It is worth it to have a 13:37strategic thinking partner. And I don't 13:39just mean for business, although I've 13:40spent most of this uh video talking 13:44about the business side. On the Substack 13:47that I wrote about this, which you can 13:48check out if you like, I actually 13:51include a very simple prompt for people 13:53who find this model scary to get 13:55started. I'm going to go ahead and read 13:56it here as well. 14:00Based on everything you know about me, 14:03what would make me 50% happier? I if you 14:06have been talking to your chat GPT, give 14:09that to 03 Pro and see what a difference 14:12it makes. 14:14See what a difference it makes. It is a 14:17much more insightful model than 03. 14:21And I think that's where I'll leave it. 14:22Good luck uh becoming 50% happier with 14:2503