ChatGPT 5.1 vs Gemini 3 Prompting
Key Points
- ChatGPT 5.1 and Gemini 3 are optimized for fundamentally different input types: 5.1 excels with clean, low‑entropy, well‑structured prompts for complex reasoning, coding, and narrative tasks, while Gemini 3 thrives on messy, high‑entropy data such as logs, PDFs, screenshots, and video that it can transform into structured information.
- The key to productivity is selecting the right model for the right job rather than trying to force a single model to handle every use case; ask “which model fits this task?” instead of assuming one works for all.
- Effective prompting for ChatGPT 5.1 continues to rely on classic habits—explicitly defining the model’s role, audience, tone, and desired output format (sections, headings, JSON schemas, bullet counts)—and differentiating between speed‑run instructions (minimal reasoning) and deeper reasoning prompts.
- Prompting Gemini 3 should focus on feeding it large, uncurated, high‑entropy context and letting the model “eat” the data, then asking it to extract, summarize, or reorganize that information into the structure you need.
- Adopt a “keep‑stop‑start” mindset when switching between models: keep role/audience/tone definitions for 5.1, stop providing raw dumps to it, and start using Gemini 3 for raw data ingestion, while for 5.1 stop over‑loading it with noise and start giving concise, purpose‑driven instructions.
Sections
- Prompting Strategies for GPT‑5.1 vs Gemini‑3 - The speaker contrasts how to feed clean, low‑entropy inputs to ChatGPT 5.1 for complex reasoning tasks versus feeding messy, high‑entropy data (logs, PDFs, screenshots, video) to Gemini 3, emphasizing the need to match each model to the appropriate job.
- Streamlining Prompts for 5.1 - The speaker advises using concise, task‑specific prompts—avoiding large background dumps and bundling multiple jobs—while leveraging explicit style instructions to treat the 5.1 model like an internal function library for better performance.
- Optimizing Prompts for Gemini 3 - The speaker advises using clear structured outputs, step‑wise reasoning, placing extensive multimodal context before instructions, and avoiding ChatGPT‑style expectations to fully leverage Gemini 3’s unique capabilities.
- Managing Multimodal Prompts and Entropy - The speaker outlines how to deliberately name and index each input modality, tune reasoning controls, and differentiate between context entropy (messy, multimodal inputs) and task entropy (open‑ended tasks) when comparing Gemini 3 to GPT 5.1.
- Choosing Between Gemini 3 and GPT 5.1 - The speaker contrasts Gemini 3’s strength in taming chaotic, high‑entropy multimodal inputs through output‑focused prompting with GPT 5.1’s advantage in handling well‑defined, structured tasks that require precise task definition and deep reasoning.
Full Transcript
# ChatGPT 5.1 vs Gemini 3 Prompting **Source:** [https://www.youtube.com/watch?v=11Bq5sxbP68](https://www.youtube.com/watch?v=11Bq5sxbP68) **Duration:** 00:16:01 ## Summary - ChatGPT 5.1 and Gemini 3 are optimized for fundamentally different input types: 5.1 excels with clean, low‑entropy, well‑structured prompts for complex reasoning, coding, and narrative tasks, while Gemini 3 thrives on messy, high‑entropy data such as logs, PDFs, screenshots, and video that it can transform into structured information. - The key to productivity is selecting the right model for the right job rather than trying to force a single model to handle every use case; ask “which model fits this task?” instead of assuming one works for all. - Effective prompting for ChatGPT 5.1 continues to rely on classic habits—explicitly defining the model’s role, audience, tone, and desired output format (sections, headings, JSON schemas, bullet counts)—and differentiating between speed‑run instructions (minimal reasoning) and deeper reasoning prompts. - Prompting Gemini 3 should focus on feeding it large, uncurated, high‑entropy context and letting the model “eat” the data, then asking it to extract, summarize, or reorganize that information into the structure you need. - Adopt a “keep‑stop‑start” mindset when switching between models: keep role/audience/tone definitions for 5.1, stop providing raw dumps to it, and start using Gemini 3 for raw data ingestion, while for 5.1 stop over‑loading it with noise and start giving concise, purpose‑driven instructions. ## Sections - [00:00:00](https://www.youtube.com/watch?v=11Bq5sxbP68&t=0s) **Prompting Strategies for GPT‑5.1 vs Gemini‑3** - The speaker contrasts how to feed clean, low‑entropy inputs to ChatGPT 5.1 for complex reasoning tasks versus feeding messy, high‑entropy data (logs, PDFs, screenshots, video) to Gemini 3, emphasizing the need to match each model to the appropriate job. - [00:03:36](https://www.youtube.com/watch?v=11Bq5sxbP68&t=216s) **Streamlining Prompts for 5.1** - The speaker advises using concise, task‑specific prompts—avoiding large background dumps and bundling multiple jobs—while leveraging explicit style instructions to treat the 5.1 model like an internal function library for better performance. - [00:06:56](https://www.youtube.com/watch?v=11Bq5sxbP68&t=416s) **Optimizing Prompts for Gemini 3** - The speaker advises using clear structured outputs, step‑wise reasoning, placing extensive multimodal context before instructions, and avoiding ChatGPT‑style expectations to fully leverage Gemini 3’s unique capabilities. - [00:10:14](https://www.youtube.com/watch?v=11Bq5sxbP68&t=614s) **Managing Multimodal Prompts and Entropy** - The speaker outlines how to deliberately name and index each input modality, tune reasoning controls, and differentiate between context entropy (messy, multimodal inputs) and task entropy (open‑ended tasks) when comparing Gemini 3 to GPT 5.1. - [00:13:23](https://www.youtube.com/watch?v=11Bq5sxbP68&t=803s) **Choosing Between Gemini 3 and GPT 5.1** - The speaker contrasts Gemini 3’s strength in taming chaotic, high‑entropy multimodal inputs through output‑focused prompting with GPT 5.1’s advantage in handling well‑defined, structured tasks that require precise task definition and deep reasoning. ## Full Transcript
Most people talk about models, but very
few people talk about the kind of mess
you hand the model. This video is all
about the differences in prompting
between Chat GPT 5.1, which came out a
week or so ago, and Gemini 3, which came
out a couple of days ago. I'm going to
get into the specifics. I'm going to
explain how you prompt them differently
and why it matters and how your
attention changes as a result. So, we're
going to get very specific and tactical
because I think that is going to be a
huge driver for you to be productive
with, frankly, both of these models
because the goal here is not to have you
pick a model. It is to have you use the
right tool for the right job. So if I
were to give you a summary of each of
these after playing with them for the
last few days, Mi3 is built to eat messy
high entropy context, logs, PDF,
screenshots, video, and turn it into
some kind of structure. Chat GPT 5.1 is
built to take clean, relatively low
entropy inputs, relatively organized
inputs, and do complex multi-step tasks
with them. reasoning, coding, planning,
narrative development. So, this implies
real shifts in your prompting habits and
you're better off asking which model do
I do with which job, which model do I
pick with which job than just assuming
that you can go with one or the other.
So, let's start and ground set on how to
think about chat GPT 5.1 and then we'll
get into Gemini 3 and the comparison. So
your baseline mental model for chat GPT
5.1 is that you should treat 5.1 as your
operator slash businesswriter/coder.
It loves clear roles. It loves
audiences. It loves specifics on tone.
Remember they tuned this model to follow
instructions and they partly did that
specifically to address complaints
around chat GPT4 on writing. Chat GPC
5.1 performs best with curated relevant
context, not just giant raw dumps. And
from a mode perspective, you get
benefits both with speed and with depth.
And you have to use them intelligently
and intentionally. And so if you are
doing a speedrun with Chad GPT 5.1, you
want to be thinking of it as what
instruction set do I give the model that
I don't want it to chew and spend time
thinking about it. I just want it to
follow exactly what I say and do it
versus an instruction set to give the
model where you want those reasoning
tokens. Everything else about chat GPT
5.1 sits on top of that understanding of
the model. And so if we transition into
prompting for 5.1 and think about you
sort of in classical engineering terms,
keep stop start, right? Like what do we
keep from a prompting perspective that
we may have used already? What do we
stop doing? What do we start doing? You
want to keep defining role, audience,
and tone. We've heard that advice for a
long time. This is still a high lever
pattern in 5.1. You want to continue to
be explicit about structure of output.
Ask about sections, headings, bullet
count, JSON, schemas. 5.1 is built to
follow those structural instructions
very reliably. You also want to keep
using modes intentionally. If it's light
edits, if it's quick answers, you're
going to go to instant. If it's hard
reasoning, if it's refactors, you're
going to use thinking. Essentially, you
want to keep letting one drive the
narrative and use the context you give
it to solve difficult tasks. This model
likes to eat problems. So, executive
memos, product narratives, internal
explainer docs. These are things that
like it's going to do really well at.
Now, you want to stop dumping huge
unfiltered context windows into 5.1. I
don't find that that is super relevant.
I think you pay more and you tend to
dilute the value of the model. You want
to stop hiding the task inside a wall of
background. I see that in a lot of
so-called big prompts. You don't
necessarily need that page of company
lore from the wiki page, right? Just ask
specifically for what you want. And this
aligns with chat GP2 5.1's docs as well.
You also want to stop packing four or
five jobs into one prompt. You want to
give the model the specific ask you're
looking for and then you can chain it
into additional steps if you need to.
And so idea generation is a different
sort of model task versus critique
versus selection. Those to me feel like
they should naturally be broken with 5.1
because of the way the model prefers
clean inputs. I would also call out that
now that we have a model that is willing
to follow instructions on writing style,
use it. Start to ask for different kinds
of instructions. How do you ask for
instruction on tone that matches
marketing versus instruction that
matches boardroom versus instruction
that matches engineering team? 5.1 is
still not quite as good as Claude at
style, but it is much better at
following instructions than it was. So,
what do you start doing? We've done keep
and stop for 5.1. You should start
treating 5.1 almost like an internal
function library. This is sort of a
little bit of engineering talk, but you
want to be able to define reusable
patterns and call back to them with
stable formats as much as you can. So,
hey, this is my stable pattern for
drafting an internal memo to the team.
I'm defining it explicitly. I'm asking
Chad JPT 5.1 to remember it or maybe if
I use it a ton, I'm putting it into
project instructions or system
instructions and then I'm going to go
back and I'm going to invoke it
deliberate deliberately by saying draft
an internal memo. You want to reuse
those table formats. You want to start
giving step plans when you want
deliberate thinking. So first ask three
clarifying questions, then propose three
options, then choose one, then write the
dock. Right? That backfills the model
into thinking carefully about the task.
You also want to start to be explicit
about tools. I talked about this when
5.1 launched. It just continues to be
important. Tell the model what tools are
important. Give it those constraints. Is
web search important? Say so. You also
want to start constraining verbosity and
register. If you say this has got to be
within five to seven bullets and it's
for a VP audience, that is super helpful
to the model because it constrains the
register of language and helps the model
know what to actually put out. That is
5.1.
When you go to Gemini, it is a different
world because Gemini is built for
different kinds of tasks. So if we run
the same keep, stop, start with Gemini
3, you get some overlap. I'm not going
to pretend like these are entirely
different beasts because they are all
large language models, but you get some
important differences that I want to
call out here. So with Gemini, you want
to continue to be precise. You want to
continue to be unambiguous. So Gemini 3
responds best to clear goals and output
formats. That's somewhat similar to 5.1.
Having some structure on the output is
also still helpful. So JSON table,
standardized tags, whatever you use for
structured output. I am not going to be
the guy that tells you JSON is magic
because it's not. Just have clear
structure. You also want to keep using
step-wise reasoning when tasks are
complicated. And so if you're saying
step one, step two, and step three, like
I talked about with 5.1, that's still
useful. What you need to stop doing.
Number one, this is so important. Please
stop treating Gemini 3 like it is chat
GPT from Google. It has different
characteristics. Its real edge as I
called out is being multimodal.
Ingesting video, ingesting images,
ingesting text, ingesting very lengthy
context as first class objects. If you
only ever send it very short text
prompts, you're not really using it for
what it's differentiated against. The
other thing you need to stop doing is
putting all your detailed instructions
at the top when you use huge context. If
you are using that million token context
window and Google's docs say this very
specifically you want to put the context
first and the instructions at the end.
So with long docs, with code bases, with
videos, the better pattern is to put all
of that at the top and then put the
instructions at the bottom and say
anchored to the information above based
on the information above XYZ. This is
your instructions. Stop assuming Gemini
3 will be verbose or chatty by default.
This is very different from chat GPT
5.1, right? Gemini 3 is tuned to be
concise. If you want a longer or more
narrative answer, you are going to need
to say so. I have wrestled with this
already with the model. It loves covers
everything, but it loves to be concise.
Stop referring to your multimodal inputs
vaguely. Seeing screenshot above is
weak. Instead, say, "Use image one,
funnel dashboard, for XYZ. Use image
two, checkout screen for ABC. Compare
them by doing one, two, three." You want
to be as specific as you can because you
have to assume that the model needs that
context to know what of this lengthy
context you're giving it. If it's
sorting through multiple videos and
maybe sorting through screenshots and
sorting through images, help it find
what you're talking about in the
instructions. Gemini 3, what can you
start doing? We've done keep and stop
for Gemini 3. The last section is start.
Start using Gemini 3 as your entropy
eater. Give it those giant messy
bundles, right? The logs, the PDFs, the
transcripts, etc. Ask it to output
structured grounded artifacts for you.
Issues lists, timelines, hypotheses,
tables. You also want to start anchoring
your long context prompts really
explicitly. So one example of a pattern
would be having roll and global
constraints having big context blocks in
the middle of the prompt for most of the
prompt and at the very end based on the
information above do x in y schema. This
helps you to anchor those long context
prompts so the model knows what to do
once it's read the context. You also
want to start specifying the verbosity
and the persona every time you prompt
with Gemini 3. Use a conversational
tone. I need 800 to a,000 words here.
return a tur bullet list, whatever it
is, don't assume that the concise
response from Gemini 3 is just going to
be fine. Decide what you want. You
should also start naming and indexing
every single modality. And that sounds
complicated, but it doesn't have to be,
right? Image one, that's naming a
modality, right? That's naming one of
the things you put in. Video two, from a
minute 30 to 2 minutes. CSV, columns 1
through 4. tell it what you want it to
use when you're giving the task because
you have to assume it needs to search
the pile and so you get better retrieval
if you're more precise. Also start using
those reasoning controls on purpose when
available. You want to raise the
thinking level only when you truly need
cross document synthesis. You can keep
it low if you're just doing labeling. If
you're just doing pure retrieval and
extraction, tune that on purpose. Just
like Chad GPT 5.1, you want to be
deliberate about when you use thinking.
So if we step back, I've done a deep
comparison here of 5.1 versus Gemini 3.
What do we see overall? The deep
difference is not just Google versus
OpenAI. It is what kind of entropy each
model is best at handling. You are
looking at a world of context entropy
versus task entropy. So context entropy
is how messy, large, and multimodal your
inputs can be. They could have lots of
irrelevant details. They can have mixed
formats. They can have timelines,
screenshots, logs, videos. Sound like
Gemini 3? It is. Task entropy is how
open-ended and multi-step the job is,
right? Vague objectives, competing
constraints, multiple stakeholders, tool
calls, planning and writing and coding.
Chat GPT 5.1 is a little bit better
there, but I want to get into the
nuance, right? So you get the best
results when you align the model to the
entropy it's dealing with. Gemini 3 does
well with high context entropy. I think
it does okay with task entropy, but I
would grade it about moderate. Here's
everything. Find the signal and
structure. It is a great use for Gemini
3. Chad GPT5
or 5.1 is very low to moderate on
context entropy. You have to give it
really clean signal, but then you can
give it a complex task. And I want to be
precise about this because Chad GPT
5.1's docs do call out that if your
context window is uh competing if you
were giving it instructions that are
ambiguous and try and cancel each other
out, like be descriptive and concise,
GBT 5.1 doesn't like that. It will burn
tokens trying to fix that ambiguity. I
have seen it push back on me when it
feels my prompts are inaccurate, which I
love. Assuming you have clear prompts
though, I do think it can handle a very
high complexity task and think it
through. It is sort of like a brain in a
jar in that regard. If you can give it a
really clean input, it can process it
and it can be quite a complex task and
it will come back really thoughtfully. I
want to emphasize here the differences
I'm talking about are differences on top
of very capable baseline LLM capacity.
These models are all good at lots and
lots of everyday tasks. They're good at
writing emails. They're good at
synthesizing PRDs. They're good at
writing engineering requirements. The
things I'm calling out are the nuances
that help you make the most of these
models. So, prompting shifts in line
with this insight around entropy. Right?
So, Gemini 3 prompts, you actually are
spending your effort on output
structure. where you're spending your
effort on task constraints on how you
anchor phrases and name and define which
part of the context you're retrieving.
So you need to get comfortable feeding
high entropy multimodal context which I
normally shy away from. But you have to
define what good synthesis looks like
and good analysis looks like across that
context. So schemas, ranking criteria,
what you retrieve, etc. Whereas with G
GPT 5.1, you're spending more of your
time defining the task definition. Is
that really clean? Is it unambiguous?
You're making sure you insist on the
tone you want. And you may pre-process
it so that well ststructured context is
available to the model so that it can
think deeply and not wade through jump.
If you want all of this in one line, use
Gemini 3 to tame the chaos of your
inputs and use chat GPT 5.1 when you're
tackling hard thinking and communication
around more structured inputs. Once that
chaos is structured, you can do some of
both with both models. But that is the
takeaway I am starting to come to. I
think both are very strong. I think
Gemini 3, we are still at the beginning
of exploring those capabilities. I know
that it has capabilities on the coding
side that I didn't discuss a lot in this
video. Probably do a separate one on how
it codes. I think a lot of the power
that you see in some of these general
purpose exams and tests that it's run
around visualization and around
understanding how code is structured and
around building things usefully in one
go. It comes down to the ability to
deeply understand multiodal inputs and
write clear coherent responses to those
inputs. And that's why I focused a lot
of this prompting guide for Gemini 3
there. There will be other insights we
come to in the future, but I wanted this
initial guide to focus on where I see
stable differences between the models so
that we can build our understanding from
there. I hope this has been useful.
Again, both models are great. It's about
understanding the nuances and that's why
I'm giving you this sort of prompting
2011 master class on GPT 5.1 versus
Gemini 3. Good luck.