Learning Library

← Back to Library

ChatGPT 5.1 vs Gemini 3 Prompting

Key Points

  • ChatGPT 5.1 and Gemini 3 are optimized for fundamentally different input types: 5.1 excels with clean, low‑entropy, well‑structured prompts for complex reasoning, coding, and narrative tasks, while Gemini 3 thrives on messy, high‑entropy data such as logs, PDFs, screenshots, and video that it can transform into structured information.
  • The key to productivity is selecting the right model for the right job rather than trying to force a single model to handle every use case; ask “which model fits this task?” instead of assuming one works for all.
  • Effective prompting for ChatGPT 5.1 continues to rely on classic habits—explicitly defining the model’s role, audience, tone, and desired output format (sections, headings, JSON schemas, bullet counts)—and differentiating between speed‑run instructions (minimal reasoning) and deeper reasoning prompts.
  • Prompting Gemini 3 should focus on feeding it large, uncurated, high‑entropy context and letting the model “eat” the data, then asking it to extract, summarize, or reorganize that information into the structure you need.
  • Adopt a “keep‑stop‑start” mindset when switching between models: keep role/audience/tone definitions for 5.1, stop providing raw dumps to it, and start using Gemini 3 for raw data ingestion, while for 5.1 stop over‑loading it with noise and start giving concise, purpose‑driven instructions.

Sections

Full Transcript

# ChatGPT 5.1 vs Gemini 3 Prompting **Source:** [https://www.youtube.com/watch?v=11Bq5sxbP68](https://www.youtube.com/watch?v=11Bq5sxbP68) **Duration:** 00:16:01 ## Summary - ChatGPT 5.1 and Gemini 3 are optimized for fundamentally different input types: 5.1 excels with clean, low‑entropy, well‑structured prompts for complex reasoning, coding, and narrative tasks, while Gemini 3 thrives on messy, high‑entropy data such as logs, PDFs, screenshots, and video that it can transform into structured information. - The key to productivity is selecting the right model for the right job rather than trying to force a single model to handle every use case; ask “which model fits this task?” instead of assuming one works for all. - Effective prompting for ChatGPT 5.1 continues to rely on classic habits—explicitly defining the model’s role, audience, tone, and desired output format (sections, headings, JSON schemas, bullet counts)—and differentiating between speed‑run instructions (minimal reasoning) and deeper reasoning prompts. - Prompting Gemini 3 should focus on feeding it large, uncurated, high‑entropy context and letting the model “eat” the data, then asking it to extract, summarize, or reorganize that information into the structure you need. - Adopt a “keep‑stop‑start” mindset when switching between models: keep role/audience/tone definitions for 5.1, stop providing raw dumps to it, and start using Gemini 3 for raw data ingestion, while for 5.1 stop over‑loading it with noise and start giving concise, purpose‑driven instructions. ## Sections - [00:00:00](https://www.youtube.com/watch?v=11Bq5sxbP68&t=0s) **Prompting Strategies for GPT‑5.1 vs Gemini‑3** - The speaker contrasts how to feed clean, low‑entropy inputs to ChatGPT 5.1 for complex reasoning tasks versus feeding messy, high‑entropy data (logs, PDFs, screenshots, video) to Gemini 3, emphasizing the need to match each model to the appropriate job. - [00:03:36](https://www.youtube.com/watch?v=11Bq5sxbP68&t=216s) **Streamlining Prompts for 5.1** - The speaker advises using concise, task‑specific prompts—avoiding large background dumps and bundling multiple jobs—while leveraging explicit style instructions to treat the 5.1 model like an internal function library for better performance. - [00:06:56](https://www.youtube.com/watch?v=11Bq5sxbP68&t=416s) **Optimizing Prompts for Gemini 3** - The speaker advises using clear structured outputs, step‑wise reasoning, placing extensive multimodal context before instructions, and avoiding ChatGPT‑style expectations to fully leverage Gemini 3’s unique capabilities. - [00:10:14](https://www.youtube.com/watch?v=11Bq5sxbP68&t=614s) **Managing Multimodal Prompts and Entropy** - The speaker outlines how to deliberately name and index each input modality, tune reasoning controls, and differentiate between context entropy (messy, multimodal inputs) and task entropy (open‑ended tasks) when comparing Gemini 3 to GPT 5.1. - [00:13:23](https://www.youtube.com/watch?v=11Bq5sxbP68&t=803s) **Choosing Between Gemini 3 and GPT 5.1** - The speaker contrasts Gemini 3’s strength in taming chaotic, high‑entropy multimodal inputs through output‑focused prompting with GPT 5.1’s advantage in handling well‑defined, structured tasks that require precise task definition and deep reasoning. ## Full Transcript
0:00Most people talk about models, but very 0:02few people talk about the kind of mess 0:04you hand the model. This video is all 0:07about the differences in prompting 0:09between Chat GPT 5.1, which came out a 0:12week or so ago, and Gemini 3, which came 0:14out a couple of days ago. I'm going to 0:16get into the specifics. I'm going to 0:18explain how you prompt them differently 0:20and why it matters and how your 0:21attention changes as a result. So, we're 0:23going to get very specific and tactical 0:25because I think that is going to be a 0:26huge driver for you to be productive 0:28with, frankly, both of these models 0:30because the goal here is not to have you 0:32pick a model. It is to have you use the 0:34right tool for the right job. So if I 0:37were to give you a summary of each of 0:38these after playing with them for the 0:40last few days, Mi3 is built to eat messy 0:45high entropy context, logs, PDF, 0:49screenshots, video, and turn it into 0:51some kind of structure. Chat GPT 5.1 is 0:54built to take clean, relatively low 0:57entropy inputs, relatively organized 1:00inputs, and do complex multi-step tasks 1:03with them. reasoning, coding, planning, 1:06narrative development. So, this implies 1:09real shifts in your prompting habits and 1:11you're better off asking which model do 1:14I do with which job, which model do I 1:17pick with which job than just assuming 1:19that you can go with one or the other. 1:21So, let's start and ground set on how to 1:23think about chat GPT 5.1 and then we'll 1:26get into Gemini 3 and the comparison. So 1:28your baseline mental model for chat GPT 1:305.1 is that you should treat 5.1 as your 1:34operator slash businesswriter/coder. 1:37It loves clear roles. It loves 1:39audiences. It loves specifics on tone. 1:42Remember they tuned this model to follow 1:44instructions and they partly did that 1:46specifically to address complaints 1:48around chat GPT4 on writing. Chat GPC 1:525.1 performs best with curated relevant 1:55context, not just giant raw dumps. And 1:58from a mode perspective, you get 2:01benefits both with speed and with depth. 2:04And you have to use them intelligently 2:06and intentionally. And so if you are 2:09doing a speedrun with Chad GPT 5.1, you 2:13want to be thinking of it as what 2:15instruction set do I give the model that 2:18I don't want it to chew and spend time 2:20thinking about it. I just want it to 2:21follow exactly what I say and do it 2:23versus an instruction set to give the 2:25model where you want those reasoning 2:26tokens. Everything else about chat GPT 2:295.1 sits on top of that understanding of 2:32the model. And so if we transition into 2:34prompting for 5.1 and think about you 2:36sort of in classical engineering terms, 2:38keep stop start, right? Like what do we 2:40keep from a prompting perspective that 2:42we may have used already? What do we 2:44stop doing? What do we start doing? You 2:46want to keep defining role, audience, 2:48and tone. We've heard that advice for a 2:50long time. This is still a high lever 2:52pattern in 5.1. You want to continue to 2:55be explicit about structure of output. 2:57Ask about sections, headings, bullet 3:00count, JSON, schemas. 5.1 is built to 3:03follow those structural instructions 3:05very reliably. You also want to keep 3:08using modes intentionally. If it's light 3:10edits, if it's quick answers, you're 3:12going to go to instant. If it's hard 3:13reasoning, if it's refactors, you're 3:15going to use thinking. Essentially, you 3:17want to keep letting one drive the 3:20narrative and use the context you give 3:22it to solve difficult tasks. This model 3:26likes to eat problems. So, executive 3:28memos, product narratives, internal 3:30explainer docs. These are things that 3:32like it's going to do really well at. 3:34Now, you want to stop dumping huge 3:36unfiltered context windows into 5.1. I 3:39don't find that that is super relevant. 3:41I think you pay more and you tend to 3:43dilute the value of the model. You want 3:45to stop hiding the task inside a wall of 3:47background. I see that in a lot of 3:49so-called big prompts. You don't 3:51necessarily need that page of company 3:52lore from the wiki page, right? Just ask 3:55specifically for what you want. And this 3:56aligns with chat GP2 5.1's docs as well. 4:00You also want to stop packing four or 4:04five jobs into one prompt. You want to 4:07give the model the specific ask you're 4:10looking for and then you can chain it 4:11into additional steps if you need to. 4:13And so idea generation is a different 4:15sort of model task versus critique 4:18versus selection. Those to me feel like 4:21they should naturally be broken with 5.1 4:23because of the way the model prefers 4:26clean inputs. I would also call out that 4:29now that we have a model that is willing 4:31to follow instructions on writing style, 4:33use it. Start to ask for different kinds 4:36of instructions. How do you ask for 4:38instruction on tone that matches 4:40marketing versus instruction that 4:42matches boardroom versus instruction 4:44that matches engineering team? 5.1 is 4:46still not quite as good as Claude at 4:49style, but it is much better at 4:52following instructions than it was. So, 4:54what do you start doing? We've done keep 4:55and stop for 5.1. You should start 4:58treating 5.1 almost like an internal 5:01function library. This is sort of a 5:02little bit of engineering talk, but you 5:04want to be able to define reusable 5:06patterns and call back to them with 5:08stable formats as much as you can. So, 5:10hey, this is my stable pattern for 5:12drafting an internal memo to the team. 5:14I'm defining it explicitly. I'm asking 5:17Chad JPT 5.1 to remember it or maybe if 5:20I use it a ton, I'm putting it into 5:21project instructions or system 5:23instructions and then I'm going to go 5:25back and I'm going to invoke it 5:26deliberate deliberately by saying draft 5:28an internal memo. You want to reuse 5:31those table formats. You want to start 5:34giving step plans when you want 5:36deliberate thinking. So first ask three 5:39clarifying questions, then propose three 5:42options, then choose one, then write the 5:44dock. Right? That backfills the model 5:47into thinking carefully about the task. 5:50You also want to start to be explicit 5:52about tools. I talked about this when 5:545.1 launched. It just continues to be 5:56important. Tell the model what tools are 6:00important. Give it those constraints. Is 6:03web search important? Say so. You also 6:06want to start constraining verbosity and 6:08register. If you say this has got to be 6:10within five to seven bullets and it's 6:12for a VP audience, that is super helpful 6:14to the model because it constrains the 6:16register of language and helps the model 6:18know what to actually put out. That is 6:215.1. 6:22When you go to Gemini, it is a different 6:25world because Gemini is built for 6:28different kinds of tasks. So if we run 6:30the same keep, stop, start with Gemini 6:333, you get some overlap. I'm not going 6:36to pretend like these are entirely 6:37different beasts because they are all 6:39large language models, but you get some 6:41important differences that I want to 6:43call out here. So with Gemini, you want 6:45to continue to be precise. You want to 6:47continue to be unambiguous. So Gemini 3 6:50responds best to clear goals and output 6:52formats. That's somewhat similar to 5.1. 6:54Having some structure on the output is 6:56also still helpful. So JSON table, 6:58standardized tags, whatever you use for 7:00structured output. I am not going to be 7:02the guy that tells you JSON is magic 7:04because it's not. Just have clear 7:06structure. You also want to keep using 7:08step-wise reasoning when tasks are 7:11complicated. And so if you're saying 7:13step one, step two, and step three, like 7:15I talked about with 5.1, that's still 7:17useful. What you need to stop doing. 7:20Number one, this is so important. Please 7:23stop treating Gemini 3 like it is chat 7:25GPT from Google. It has different 7:28characteristics. Its real edge as I 7:30called out is being multimodal. 7:32Ingesting video, ingesting images, 7:35ingesting text, ingesting very lengthy 7:37context as first class objects. If you 7:40only ever send it very short text 7:42prompts, you're not really using it for 7:44what it's differentiated against. The 7:46other thing you need to stop doing is 7:48putting all your detailed instructions 7:50at the top when you use huge context. If 7:53you are using that million token context 7:56window and Google's docs say this very 7:57specifically you want to put the context 8:01first and the instructions at the end. 8:04So with long docs, with code bases, with 8:07videos, the better pattern is to put all 8:09of that at the top and then put the 8:11instructions at the bottom and say 8:13anchored to the information above based 8:16on the information above XYZ. This is 8:19your instructions. Stop assuming Gemini 8:223 will be verbose or chatty by default. 8:24This is very different from chat GPT 8:265.1, right? Gemini 3 is tuned to be 8:30concise. If you want a longer or more 8:32narrative answer, you are going to need 8:35to say so. I have wrestled with this 8:36already with the model. It loves covers 8:39everything, but it loves to be concise. 8:42Stop referring to your multimodal inputs 8:44vaguely. Seeing screenshot above is 8:47weak. Instead, say, "Use image one, 8:50funnel dashboard, for XYZ. Use image 8:53two, checkout screen for ABC. Compare 8:56them by doing one, two, three." You want 8:58to be as specific as you can because you 9:00have to assume that the model needs that 9:03context to know what of this lengthy 9:07context you're giving it. If it's 9:08sorting through multiple videos and 9:09maybe sorting through screenshots and 9:11sorting through images, help it find 9:13what you're talking about in the 9:15instructions. Gemini 3, what can you 9:18start doing? We've done keep and stop 9:19for Gemini 3. The last section is start. 9:22Start using Gemini 3 as your entropy 9:26eater. Give it those giant messy 9:28bundles, right? The logs, the PDFs, the 9:30transcripts, etc. Ask it to output 9:32structured grounded artifacts for you. 9:34Issues lists, timelines, hypotheses, 9:38tables. You also want to start anchoring 9:40your long context prompts really 9:43explicitly. So one example of a pattern 9:45would be having roll and global 9:47constraints having big context blocks in 9:50the middle of the prompt for most of the 9:51prompt and at the very end based on the 9:53information above do x in y schema. This 9:57helps you to anchor those long context 9:59prompts so the model knows what to do 10:02once it's read the context. You also 10:04want to start specifying the verbosity 10:06and the persona every time you prompt 10:08with Gemini 3. Use a conversational 10:10tone. I need 800 to a,000 words here. 10:13return a tur bullet list, whatever it 10:14is, don't assume that the concise 10:17response from Gemini 3 is just going to 10:19be fine. Decide what you want. You 10:21should also start naming and indexing 10:24every single modality. And that sounds 10:26complicated, but it doesn't have to be, 10:28right? Image one, that's naming a 10:30modality, right? That's naming one of 10:32the things you put in. Video two, from a 10:34minute 30 to 2 minutes. CSV, columns 1 10:37through 4. tell it what you want it to 10:40use when you're giving the task because 10:42you have to assume it needs to search 10:45the pile and so you get better retrieval 10:48if you're more precise. Also start using 10:51those reasoning controls on purpose when 10:53available. You want to raise the 10:55thinking level only when you truly need 10:58cross document synthesis. You can keep 11:01it low if you're just doing labeling. If 11:03you're just doing pure retrieval and 11:05extraction, tune that on purpose. Just 11:07like Chad GPT 5.1, you want to be 11:10deliberate about when you use thinking. 11:12So if we step back, I've done a deep 11:14comparison here of 5.1 versus Gemini 3. 11:17What do we see overall? The deep 11:19difference is not just Google versus 11:22OpenAI. It is what kind of entropy each 11:26model is best at handling. You are 11:28looking at a world of context entropy 11:30versus task entropy. So context entropy 11:32is how messy, large, and multimodal your 11:35inputs can be. They could have lots of 11:37irrelevant details. They can have mixed 11:39formats. They can have timelines, 11:40screenshots, logs, videos. Sound like 11:43Gemini 3? It is. Task entropy is how 11:46open-ended and multi-step the job is, 11:49right? Vague objectives, competing 11:50constraints, multiple stakeholders, tool 11:52calls, planning and writing and coding. 11:54Chat GPT 5.1 is a little bit better 11:56there, but I want to get into the 11:58nuance, right? So you get the best 12:00results when you align the model to the 12:02entropy it's dealing with. Gemini 3 does 12:05well with high context entropy. I think 12:07it does okay with task entropy, but I 12:09would grade it about moderate. Here's 12:11everything. Find the signal and 12:12structure. It is a great use for Gemini 12:153. Chad GPT5 12:17or 5.1 is very low to moderate on 12:21context entropy. You have to give it 12:23really clean signal, but then you can 12:25give it a complex task. And I want to be 12:27precise about this because Chad GPT 12:295.1's docs do call out that if your 12:33context window is uh competing if you 12:35were giving it instructions that are 12:38ambiguous and try and cancel each other 12:40out, like be descriptive and concise, 12:42GBT 5.1 doesn't like that. It will burn 12:44tokens trying to fix that ambiguity. I 12:47have seen it push back on me when it 12:49feels my prompts are inaccurate, which I 12:50love. Assuming you have clear prompts 12:53though, I do think it can handle a very 12:56high complexity task and think it 12:58through. It is sort of like a brain in a 13:01jar in that regard. If you can give it a 13:04really clean input, it can process it 13:06and it can be quite a complex task and 13:08it will come back really thoughtfully. I 13:10want to emphasize here the differences 13:12I'm talking about are differences on top 13:15of very capable baseline LLM capacity. 13:20These models are all good at lots and 13:23lots of everyday tasks. They're good at 13:25writing emails. They're good at 13:26synthesizing PRDs. They're good at 13:28writing engineering requirements. The 13:30things I'm calling out are the nuances 13:32that help you make the most of these 13:34models. So, prompting shifts in line 13:36with this insight around entropy. Right? 13:39So, Gemini 3 prompts, you actually are 13:41spending your effort on output 13:43structure. where you're spending your 13:44effort on task constraints on how you 13:47anchor phrases and name and define which 13:51part of the context you're retrieving. 13:53So you need to get comfortable feeding 13:55high entropy multimodal context which I 13:57normally shy away from. But you have to 14:00define what good synthesis looks like 14:02and good analysis looks like across that 14:04context. So schemas, ranking criteria, 14:07what you retrieve, etc. Whereas with G 14:09GPT 5.1, you're spending more of your 14:11time defining the task definition. Is 14:14that really clean? Is it unambiguous? 14:17You're making sure you insist on the 14:18tone you want. And you may pre-process 14:20it so that well ststructured context is 14:22available to the model so that it can 14:24think deeply and not wade through jump. 14:26If you want all of this in one line, use 14:29Gemini 3 to tame the chaos of your 14:32inputs and use chat GPT 5.1 when you're 14:36tackling hard thinking and communication 14:39around more structured inputs. Once that 14:42chaos is structured, you can do some of 14:46both with both models. But that is the 14:48takeaway I am starting to come to. I 14:51think both are very strong. I think 14:53Gemini 3, we are still at the beginning 14:55of exploring those capabilities. I know 14:58that it has capabilities on the coding 15:00side that I didn't discuss a lot in this 15:02video. Probably do a separate one on how 15:04it codes. I think a lot of the power 15:06that you see in some of these general 15:10purpose exams and tests that it's run 15:12around visualization and around 15:14understanding how code is structured and 15:17around building things usefully in one 15:20go. It comes down to the ability to 15:23deeply understand multiodal inputs and 15:27write clear coherent responses to those 15:30inputs. And that's why I focused a lot 15:32of this prompting guide for Gemini 3 15:33there. There will be other insights we 15:36come to in the future, but I wanted this 15:37initial guide to focus on where I see 15:40stable differences between the models so 15:42that we can build our understanding from 15:44there. I hope this has been useful. 15:46Again, both models are great. It's about 15:48understanding the nuances and that's why 15:51I'm giving you this sort of prompting 15:532011 master class on GPT 5.1 versus 15:57Gemini 3. Good luck.