Four Core Moves for Prompting
Key Points
- The speaker is consolidating a year’s worth of prompt guides into a structured course that offers a beginner‑friendly pathway, an advanced track, and a “jump‑in” option for experienced users.
- Prompting is framed as briefing a contractor: you must clearly define the desired deliverable’s shape, format, and constraints to get consistent, useful results.
- The first core move is to specify the output shape (e.g., word count, bullet points, tables, checklists) so the AI knows exactly what form the answer should take and avoids unwanted filler or style quirks.
- The second move (introduced but not detailed) emphasizes providing just enough context to guide the AI without overwhelming or confusing it, ensuring the response stays on target.
Sections
- Untitled Section
- Guidelines for Fact‑Based AI Prompts - Advice on supplying labeled factual inputs and instructing the AI to respond only with those facts—or say “unknown” when unsupported—to minimize hallucination.
- Prompting for Concise, Self‑Checked Answers - The speaker explains how to direct a language model to truncate its internal reasoning, provide a brief answer with a top recommendation and justification, and perform a quick quality check before delivering the response.
- Guiding AI Output Structure - The speaker explains how explicitly defining the desired output format, criteria, and level of detail in prompts enables the model to produce customized decision tables and incident summaries while allowing flexibility to omit or adjust elements based on the user's priorities.
- Prompt Engineering: Prioritize Pipelines - The speaker argues that effective prompting starts with designing the full retrieval, tool, and memory pipeline—as a first‑class object—since prompts behave differently across environments, and stresses treating all input context as a trust‑bounded supply chain to ensure safety and reliability.
- Scaling Prompt Design Principles - The speaker explains that clear contracts and output schemas act as scalable, fractal prompts—using entropy settings like temperature and top‑p plus constraints to shape a model’s probability distribution.
- Treat Prompts Like Production Code - The speaker stresses that prompts require the same rigorous testing, monitoring, versioning, and rollback processes as production code, accounting for wide user distributions and embracing multiple models for robust performance.
- Governance Over Heroic Prompting - Effective AI prompt production relies on simplicity, versioning, testing, and built‑in safety through structured governance rather than ad‑hoc heroic efforts.
- Managing LLM Memory and Enforcement - The speaker explains how product choices shape LLM context windows, why retrieval‑augmented architectures are needed to handle memory, and that automated output checks outperform relying on human vigilance.
Full Transcript
# Four Core Moves for Prompting **Source:** [https://www.youtube.com/watch?v=UhyxDdHuM0A](https://www.youtube.com/watch?v=UhyxDdHuM0A) **Duration:** 00:30:14 ## Summary - The speaker is consolidating a year’s worth of prompt guides into a structured course that offers a beginner‑friendly pathway, an advanced track, and a “jump‑in” option for experienced users. - Prompting is framed as briefing a contractor: you must clearly define the desired deliverable’s shape, format, and constraints to get consistent, useful results. - The first core move is to specify the output shape (e.g., word count, bullet points, tables, checklists) so the AI knows exactly what form the answer should take and avoids unwanted filler or style quirks. - The second move (introduced but not detailed) emphasizes providing just enough context to guide the AI without overwhelming or confusing it, ensuring the response stays on target. ## Sections - [00:00:00](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=0s) **Untitled Section** - - [00:03:33](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=213s) **Guidelines for Fact‑Based AI Prompts** - Advice on supplying labeled factual inputs and instructing the AI to respond only with those facts—or say “unknown” when unsupported—to minimize hallucination. - [00:06:56](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=416s) **Prompting for Concise, Self‑Checked Answers** - The speaker explains how to direct a language model to truncate its internal reasoning, provide a brief answer with a top recommendation and justification, and perform a quick quality check before delivering the response. - [00:10:07](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=607s) **Guiding AI Output Structure** - The speaker explains how explicitly defining the desired output format, criteria, and level of detail in prompts enables the model to produce customized decision tables and incident summaries while allowing flexibility to omit or adjust elements based on the user's priorities. - [00:13:42](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=822s) **Prompt Engineering: Prioritize Pipelines** - The speaker argues that effective prompting starts with designing the full retrieval, tool, and memory pipeline—as a first‑class object—since prompts behave differently across environments, and stresses treating all input context as a trust‑bounded supply chain to ensure safety and reliability. - [00:17:08](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1028s) **Scaling Prompt Design Principles** - The speaker explains that clear contracts and output schemas act as scalable, fractal prompts—using entropy settings like temperature and top‑p plus constraints to shape a model’s probability distribution. - [00:21:01](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1261s) **Treat Prompts Like Production Code** - The speaker stresses that prompts require the same rigorous testing, monitoring, versioning, and rollback processes as production code, accounting for wide user distributions and embracing multiple models for robust performance. - [00:24:09](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1449s) **Governance Over Heroic Prompting** - Effective AI prompt production relies on simplicity, versioning, testing, and built‑in safety through structured governance rather than ad‑hoc heroic efforts. - [00:27:27](https://www.youtube.com/watch?v=UhyxDdHuM0A&t=1647s) **Managing LLM Memory and Enforcement** - The speaker explains how product choices shape LLM context windows, why retrieval‑augmented architectures are needed to handle memory, and that automated output checks outperform relying on human vigilance. ## Full Transcript
So, I have never done this. I went back
through all of the prompt guides I've
written over the last year plus and I
went through them. I collated them. I
pulled them together and I'm putting it
into a comprehensive
prompting course. But that's not all.
I'm also separating it out so that it's
got a really easy beginners pathway into
prompting. I'm going to lay out the
first steps of that here in the video.
then a more advanced guide for when
you're ready. And for folks that want to
jump right to it, there's lots there.
So, this video is going to be very
simple. We're going to go through the
four moves that I want to sort of call
out as the heart of prompting if you're
still sort of trying to wrap your mind
around what prompting is and why it
matters. And then in the second part of
the video, I'm going to give a little
teaser trailer to the folks who are
interested in advanced prompting
techniques. and I'm going to talk about
the patterns that I uncovered digging
through hundreds of pages of prompting
notes and guides that I have written
over the last few months. What is it
that stands out? What is it that's
consistent? I found some really
interesting things. So, let's start with
the beginner friendly accessible moves.
Four big moves. The big idea here is
very simple. Instead of thinking of
having a casual chat with an artificial
intelligence, instead of thinking of
just chatting with chat GPT, I want you
to think of it like you are briefing a
contractor on exactly what you need. And
you have four simple moves to do that.
And this is actually it maps really well
to talking with a contractor. So you
will get more consistent and useful
results if you start with, for example,
move one. Tell your contractor, tell the
AI what shape you want the deliverable,
the thing it's going to write to be in.
Think of it like ordering at a
restaurant. If you are not specific
about what you want on the plate, if you
don't say, "Hey, I don't want pastrarami
on the sandwich." Well, you're going to
get pastrami on the sandwich.
So instead of saying write something
about X, tell the AI exactly what format
you need. As an example of the kind of
output shape that you can request,
give me one paragraph between 110 and
130 words. I don't want any headings.
It'll do it.
Write exactly five bullet points, one
sentence each. It'll do it. Create a
simple five row comparison table. It'll
do it. Make a checklist with only six
items. You get the idea. You can be
extremely specific. And this is actually
a way to contain some of the more
annoying response patterns that some of
these AIs have. So, for example, if
you're tired of Claude telling you all
the time how right you are and
absolutely, you can just say no LLM
bumpers. Go away. I just want your
response. If you are tired of Chad GPT5
writing super long responses, you can
say, "You got to give me the answer in
150 words or less." There are ways you
can have more control here than you
might think. Move two,
give just enough context to your
contractor or AI. Not too much. This is
like giving someone a recipe with all
the ingredients they need and not too
much. It's basically like a HelloFresh
box if you've ever used it. It ships you
the recipe and it ships you all the
ingredients. Not any not anymore. Just
just the ingredients you need.
Tell the chat, tell the AI to use only
the specific facts that you are
providing. Label them so that you can
track where they came from. If you're
really want to be sure that it got it
right, right? So, for example, if you
were trying to get it to generate a
response that's really clean and clear,
you can say, "Here are some facts. Fact
one, our customer turn increased from 3%
to 5% last quarter. Fact two, most of
our cancellations are from small
accounts. Fact three, our competitor
just launched a cheaper monthly plan.
Use only these facts. If something isn't
covered here, please say unknown instead
of guessing. This is helpful because you
are less likely, not guaranteed, not
foolproof, but you are less likely to
have the chat to have the AI make things
up when you explicitly tell it that it
has space to say unknown. There are lots
of other ways to do this that I've
suggested in the past. You can say
inference as a label. You can suggest
that it asks you questions instead of
trying to make stuff up. Really your
choice there depends on the degree to
which you want to give it more
information to fill in those gaps versus
you were giving it all it has and it has
to make the best assessment it can based
on the information because that's all
you got. And so you have to decide like
how much information are you giving it?
How much clarification do you want to
offer it as a chance? Fundamentally
though the idea is give it the context
it needs. Don't give it too much. If you
want to be really clear about what's in
the box, you can label that context. You
don't have to. You can also just list
the facts and say, "These are facts. I
want you to use these facts. Please do
not guess.
Please, please use another alternative
method because just saying do not guess,
the model wants to be complete and
helpful. So, you have to sort of give it
another way to be complete and helpful."
And so saying label it unknown that
would be much more helpful is a way to
sort of let the model know to like track
into that helpfulness vector that it has
that's very strong. Okay. Move number
three. Give your contractor or AI a
simple behind the scenes plan.
It's it's giving someone driving
directions. It's telling them the route,
but you don't need them to narrate every
turn. And so you can actually suggest to
the AI these are some steps you can take
but you don't have to tell me about
them. And that gives the AI some
guidance on the tool calling some
guidance on the process it uses that is
actually very helpful. So as an example
silently follow these steps. List the
options compare them. Recommend only one
then show me the final comparison in
your head. Prioritize by importance. Add
time estimates, assign owners, just show
me the final agenda. You see how these
are slightly different wordings, but you
see the same idea behind them. You don't
actually need to see the thinking
process here. The whole like, please
show me your thinking was an artifact of
a pre-reasoning model era. We have
reasoning models now. They will do
reasoning. All you're doing is giving
them some railroad tracks to guide them
on that reasoning process and keep them
moving in the direction you want. And
that's helpful. So, by the way, if the
AI comes back and it shows you the
thinking process and you find that
unhelpful, you can just say, you know
what, please don't show me the plan.
Just show me the result.
And if it gets complicated and it comes
back with a really long answer, that's
either you didn't specify the result or
you can say, you know what,
please keep your internal thinking to a
shorter timeline, right? I don't have
time forever. Like, and this is
something that we're seeing more and
more in some of these interfaces that
are coming out. Like if you look at
perplexity, you can click get the answer
now. Very similar idea. You basically
are telling it cut short whatever you're
doing and come back with a response
because I actually don't care that much
about how in-depth your thinking is for
this problem. And you as the human get
to decide that you get to say, yeah, I
don't care, right? I don't care that
much. And then this is a really
interesting one. If you go through this
whole thing, do the behind-the-scenes
plan, right? the silent plan and there's
no clear conclusion. Just add end with
your best recommendation.
And why? Because sometimes it's going to
come through and like you have to again
give the model some help to be helpful.
Like it wants to be helpful. Give it a
sense of what to do in that ambiguous
situation. Pick the best one. Tell me
why. Okay. Move four. It is helpful to
add a quick quality check even in a
short prompt. It's like having the model
proofread itself a little bit before it
comes back to you. It's like asking the
contractor to check their work.
Here's some examples of how to phrase
it. Before showing me your response,
hey, can you check are there actually
five bullets? If not, fix it. Verify
every claim has a fact number or verify
each claim is based on the facts I gave
you or that it says unknown. Confirm
that this paragraph is between 110 and
130 words. Check your work. Now, you
might think, well, those are really
silly examples you're giving me, Nate,
but they're memorable and you can use
them however you want. You can specify
the output shape that you want. You can
specify how to check. You don't have to
check the word length if that's not
important to you. You can check for
content and topics. The point is ask for
a check. It can help the model to do a
second pass and address formatting
issues. It can help address uh
hallucinations and mitigate them to some
extent. It can help with missing
elements. It can help with incorrect
length. It can help with uh complexity
issues if it's too complicated.
You just have to ask it to check its
work in line with the original goal that
you gave it. So, let's look at some
complete examples. So, here's three
complete examples. You have an email
outline and all you're doing is you're
saying, "I want you to outline an email,
too." And you can insert the audience.
You have the topic. I want you to
actually format this as an email with
five bullets. And I and you're very
clear about what's in here, right?
There's a hook to this. There's why it
matters. There's two points and an ask.
That's five bullets. It knows how much
it has. And you want to check it that,
you know, if it doesn't have five
bullets, it has to fix it. You can get
an explainer prompt that works this way,
too. Right? The task is to explain
whatever concept. Maybe it's LLM token
architecture, right? Explain it as if
it's to a smart 12-year-old. You have
three parts with headings. what it is an
example and gotchas or
misunderstandings. You have limits. It
has to be less than 140 words and check
that all three headings are present
because you want to make sure that all
that's what you want to check, right?
That all three of those are present. A
decision helper. Compare four options
for this decision I'm facing. Compare on
price, setup, learning curve, ongoing
effort. You could really define it any
way you want. This is the shape of the
output. It's a four row table. And end
with your recommendation one sentence
pick and why. Please check that the
table has rows and columns.
Pretty intuitive, right? And one of the
things that you'll notice is in each of
these I am giving the model space to
process how it wants. That is not always
true. It does underline one of the core
principles here. If you are
communicating your intent really, really
clearly, you can choose to drop some of
these. You can choose to say, you know
what, the shape doesn't matter. You can
choose to say, "I don't care about the
check." And that depends on you deciding
what matters and what doesn't for the
work that you're trying to do. Let me
give you a few more examples. Now, these
examples are just a little bit more
advanced. And I want you to notice that
we've decided to be more specific. We've
decided to have some silent process
here. An incident snapshot. Let's say
something goes wrong at work, right? You
want to describe what happened and what
went wrong. You include some notes here
that you label. You specify what the
shape of the output is for executives.
You say this is the process because you
really do care that it actually looks
through the facts. It quantifies the
scope. These are things you need it to
do to know that it looked at all the
notes. Um and then you specify
that there is an owner, right? It has to
check that there's an owner. It has to
be clear if there's something that's
unknown. Uh so that you're not
overclaiming.
What you're trying to do here is
essentially take a fairly complex
normally human task and actually put it
into a short prompt that you can get a
very reliable initial pass on for a high
stakes action. So it makes sense that
each of these things actually specified
in the same way. Uh let's say you want
an action plan from a highv value
meeting, right? You have a shape. You
have a checklist with boxes that you
want for your action plan. You have
notes that you've included. Your silent
process is really around extracting the
tasks, ranking them, making sure that
you get the right tasks. And you want to
make sure that you have exactly seven
boxes and that every line has an owner
and a due date, and that you're not
duplicating tasks. So, that's what you
want to check. You get the idea here,
right? You can do a more complex
assessment for higher value tasks. And
you can see here the the example from
the title for a for a video post. It's
very similar. I won't go into it in
depth. The point is this format is
flexible enough to get somewhat more
specific with a higher value task or it
can be lighter like I showed you
initially. Now, we are going to do a
little bit of a jump forward and we're
going to talk about some of the
prompting patterns that I learned when I
was digging through my own notes and
exploring how prompting has evolved in
2025. These are things that are
non-obvious that I don't see discussed
online. So, hold on to your hats. All
right. What are the nonobvious,
underdised principles that came out to
me as I sat here and I was putting
together these guides, these this
in-depth prompting guide that I've been
putting together for you. Number one,
the unit of design. We think of it as
the prompt, but the more I stared at it,
the unit of design is the pipeline, not
the prompt. So we think about prompts as
like things we tell the chatbot, but
really even if you're in the chatbot,
the prompt is living inside a structure.
It's living inside an architecture. So
there's things like retrieval or maybe
tool calls or memory or evaluation.
When you treat each of these as isolated
artifacts, the reliability of the prompt
falls apart. If you build the pipeline
first, then you can write prompts to
fit. And so a lot of my work when I'm
sort of building prompts, helping people
move through from beginner to
intermediate to advanced stages of
prompting is really understanding the
pipeline they're operating in. And for
many people, the default pipeline is
just the chat GPT user interface. It's
all defined, right? Whatever you have,
it's in the interface there. If you're
building your own, all of that stuff is
up for dialing and adjustment, right?
And in in those circumstances, you are
designing the prompt after you design
the pipeline. the pipeline is the first
thing you think about. And this is part
of what makes prompting so difficult to
teach because if I give you a prompt
insight, it may work really well in a
particular pipeline environment but not
as well in another one. So think about
your pipeline as a first class object.
Second thing that came out to me,
context is a supply chain with trust
boundaries.
So every token that you feed the model
comes from somewhere. Even in a short
prompt, user input, docs, the web, it is
really important that you think of that
context as having trust boundaries. If
you care about things like safety,
security, prompt injection, if you care,
which you should by the way, if you care
about reliability and avoiding
hallucinated responses or inaccurate
responses,
the way you do that is by making sure
that you understand that you have
trusted resources, somewhat less trusted
resources, and unsafe or untrusted
resources.
Now, if you're building a complex
pipeline, you may actually directly
label that stuff. If you're building a
high-value prompt, you may directly
label that. I do that sometimes. I'm
like, "These notes are very reliable.
These notes are not reliable." Right? If
you are in the middle of a casual
prompt, you may actually just label it
in the middle of the chat. But
regardless,
you need to recognize that when you are
working with context, you're basically
loading in a supply chain of tokens. And
the more you can indicate trust this,
don't trust this, the more likely you
are to get higher quality, higher
reliability responses out of your third
one, I have mentioned this before, it
has been a while. Contracts really
matter. And contracts are basically a
fancy way of saying format matters and
pros come second. And so if you are
going to
be specific about the outputs that you
expect, if you are going to frame an
interaction as we are forming an
agreement together, we are forming a
contract together. That is going to go a
long way because just as I said it in
the first part of this video, you're
working with a contractor. That's the
mental model. You want to have a
contract with your contractor and you
want to make sure the outputs are
specified. Now some people go overboard
here and I have seen the hype around the
internet where they say well JSON is the
only way to prompt and that's it's not
well supported by the documentation.
Certainly models can read JSON but they
can read lots of other things too. If
anything the way to think about it is
that you want clarity. You want clarity
on your expected format and outputs and
that in turn helps improve reliability
because the model knows what you want.
And so contracts are just a way of
encoding your intent. And if you have a
more complex pipeline, your contracts
get suitably more complex. But this idea
scales, right? Like all of these
insights are designed to scale. One of
my hypotheses, if you want a little
sidebar from Nate, is that prompting is
fractal. Like you have a simple version,
but it's fractally related. It's it's
essentially the same thing in miniature
as a more advanced version of prompting.
And a lot of these insights I found are
fractal. to scale.
Number four, entropy
is a design variable. Just I know, sit
down, have some coffee. We we're getting
deep in here. So, if you understand
about prompting,
you understand that you can shape things
like temperature and top P, which if
you're a beginner, you don't worry about
that's preset in chat GPT. If you're
more advanced, you can set that in the
API and it constrains the probability
distribution. But you can also use
constraints, you can use examples, you
can use output schemas and those further
narrow the distribution, right? You're
still shaping the distribution. So the
the larger the larger insight here is
that your entire goal is to shape the
probability mass.
That's what you're doing with with
constraints, with examples, with
context, with output schemas, with
temperature, with top P. You're all
shaping the probability mass. You are
using entropy as a design variable.
You're not just making it more creative.
you're actually shaping the probability
mass of the outcome. Now, that may seem
really abstract, right? It it may seem
theoretical, but I think having correct
mental models helps us to actually
decide where to apply leverage. And this
is definitely an advanced concept, but
if you understand that temperature, top
P constraints, examples, outputs, all of
that is related. It helps you to
understand where to apply leverage in
the system because you can say well the
output schema is probably going to be
more effective on probability mass here
versus temperature because X or Y. So
just take that think about it go for a
walk think about it some more and it may
start to click with troubleshooting
advanced prompts.
Number five scaffolding often matters
more than just horsepower. So if you are
if you're in the API, you you can just
like burn tokens on stuff and that can
be helpful. But if you can give the
model techniques, things that I cover in
the advanced prompting guide that that
is sort of in the substack here, like
least to most tree of thought, stepwise
plans, you're essentially giving the
model a way to reduce cognitive load.
You're giving it something that helps it
with error accumulation, and you're
adding structure, right? You're giving
it a scaffolding for how it thinks. And
that scaffolding makes it more token
efficient.
And so the little meme that I keep in
mind is that scaffolding beats
horsepower.
Have good scaffolding. And yeah,
horsepower is inevitable, right? Like
there's a reason we talk about burning
tokens as a way to solve problems. But
in a world where we need to be token
efficient, which is especially true in
production prompting environments,
have good scaffolding. Scaffolding
matters. And I definitely get into that
in the advanced uh sort of prompting
section. And so if you want to dive in,
there's a lot there. Number six,
shifting the distribution
is enough to break your best prompt. So
if prompts are tuned on a handful of
examples, they will drift in the wild if
they face a wildnormed distribution,
which is a fancy way of saying if you
train your prompts and you say this
prompt works well in the studio in the
environment and then you release it into
the wild and it's dealing with consumers
and they're asking all kinds of things,
it's going to break your best prompt.
This is another way of saying I have
seen the debates on evals. You
absolutely need to take quality
seriously in production. You need to
have the ability to test it. You need to
have the ability to monitor it. You need
to have the ability to evaluate it and
roll it back and version it and treat
the prompt like code. The wild
distribution is good enough to break the
best prompt. And the way to address it
is to treat prompts like production code
that need sustained investment to be
optimized over time.
You can't assume you can just release it
and it tested well. And again, the
mental model matters here. If you think
about a wild distribution of everyone
trying to sort of send queries against
your chatbot, well, that's going to
matter. Like that's going to break it.
You can understand why that breaks it
and you can understand why that's
different from a lab grown distribution
which tends to have narrower tails. And
that in turn can shape your evals. Your
eval should push your your distribution
as wide as you can. Part of your goal is
to get your lab grown distribution to
stretch and to actually be more
effective. Number seven, model pluralism
is a feature, not a bug. Different
models really do have distinct
personalities and strengths. You feel it
more the more you work with them. It
really, really matters that you not
build your architectures assuming only
one model or assuming one model will do
it all or assuming that you only need
one model for now, etc., etc. This is
one of those things that marks the
boundary between a beginner view of AI
and a more advanced view of AI. The
farther along you are in building sort
of with AI,
the more you recognize that you are
building with multiple models and people
have like Claude and Chad GPT and Gemini
for this and this and they and they'll
recite off the top of their head all the
use cases they have and which models
they use and if you're building
production pipelines, you'll know which
versions of those models you're going to
use and why. Model pluralism is not only
a feature, it is the future. You should
expect to have a pluralistic model
environment. And the more advanced you
are, the more models you're going to be
using, not because you love complexity,
but because you love efficiency and you
can pick the right model for the right
task. Insight number eight,
the farther you get with prompting, the
more you recognize that economics are a
first class constraint. You know, Satcha
talked about dollar cost per token per
watt as the equation for the next 10
years. And he was right. Token budgets
matter. Latency matters. Fallback logic
matters. The more you look at
architectures, it's less about picking
cool tools and it's more about making
sure that you have a reliable, scalable,
efficient architecture. thinking about
how loads actually hit models, what
models handle what loads, how models
pass and make tool calls at scale.
Those all start to matter. What is the
cost of the model making those choices?
Why do we need it to make that choice?
Is there a simpler way we can design
this? This is actually one of the great
examples where I see strong engineering
instincts from humans handily beating
LLMs at the moment.
Humans are really good at designing
efficiently engineered systems and
models tend to be good at adding
complexity. And when you are trying to
design a system that keeps economics in
mind, you have to design for simplicity
first.
Number nine, governance beats heroics.
Heroics are a fancy way of saying people
are out in production and they're
desperately trying to do their very best
and they're trying to write the best
prompt by next Thursday, etc., etc. No.
Have good governance. have versioning.
You can AB test changes. You can
evaluate with rubrics. You can log
things. Your prompt library ends up
being intellectual property. Manage it
as if it was code. Govern it like it was
code. And this is a different way of
talking about the same thing I talked
about earlier. Distribution shift will
break your best prompt. That was really
about how you care about testing and
evaluating. This is really about the
governance structure that keeps
production prompts going. Governance
beats heroics. Just like we say in
engineering circles, don't play hero
ball. If it is up to you to play hero
ball to sustain this code in production,
it's not good code. Similarly, if it's
up to you to sustain this prompt in
production, it's not a good prompt. You
need good governance first. Number 10,
safety gets designed in like at core.
It's not added on. So, constitutional
system level rules, which I talked about
in the advanced guide, refusal styles
and how you handle it, how it handles
ambiguity, how you address jailbreak
attempts and prompt injection attacks,
output moderations. That's all part of
the spec, right? That's not an
afterthought. You think about that from
the beginning because you have to assume
that models are unsafe by default. And
the only way you make them safe is by
designing them to be safe from the
get-go. And so again, this is one of
those things that really distinguishes
advanced prompters from beginning
prompters. Beginning prompters operating
in a chatbot, the chatbot is pretty what
we would call nerfed, right? It's it's
like whatever chat GPT gives you,
whatever Claude gives you, you can
bounce off those walls, but you're not
going to get very far. If you're
designing your own system, it's very
different. You are responsible for all
of that safety stuff, and you have to
take it seriously.
Number 11, memory is not a toggle.
Memory is a product choice. Deciding
what persists, where it's stored, how
it's summarized and validated, what the
model remembers. You have to
assume that the context window is
nothing. You have to assume that memory
or statefulness is something you want to
design into the architecture of the
system. And context windows are just
ways to generate a particular response.
People who assume the context window or
the model remembers is going to work,
it's not going to work.
Even if it's a very large context
window, it is only useful for that
response. Keep in mind models are
reinforcement learned. They're trained
on basically single response patterns.
Whatever you give it, the ideal response
from that model is one
one. Now, some people will go longer,
but this this key idea explains why
beginners are often surprised when they
continue a chat and then they get 30 40
turns in and the model seems to forget.
That's just how it works, right? The
product choice that chat GPT has made
and other model makers have made is to
have like a rolling context window in
certain situations and so the model will
keep talking and roll the context window
and it will forget the initial thing.
Now, Claude has made the choice not to
do that. Notably, Claude has decided
they're just going to end the chat. That
also frustrates people. There is no good
answer here.
The only way that people who build
advanced prompt architectures get this
to work is by
deliberately architecting what is stored
and what is summarized and validated and
how it's retrieved. This is where we get
into the discussion of rag, right?
Retrieval, log generation, other ways,
chunking data, etc.
Memory is a product choice. Don't assume
memories in the context window. The 12th
and last insight,
automated enforcement beats human
vigilance. Do not trust that the model
will follow your rules. Install
automated checks. In larger systems, you
actually have separate LLMs that are
checking for specific elements of the
output and then coming back. And they're
cheaper, right? They're often dumber
LLMs and they check for specific things.
Is it in a bulleted format? Right? Is it
reflecting the style guide? Whatever it
is, right? You want to make sure that
you have automated enforcement checks
because that beats the best human
diligence. It beats the best human
evals.
The more you can build enforcement of
the schema, enforcement of the output
into the system. So, it's just part of
the system. You're not playing hero
ball. It's just good engineering. And
that's really where I want to leave you.
So many of these insights as I stared
and I'm I'm not kidding. It was like 500
pages of like prompt stuff I've written.
I've been pretty prolific. And I keep
looking at this and I'm like, you know
what? Really what we're doing is we are
pulling so many of the principles of
good software engineering forward and
we're talking about them in the context
of artificial intelligence system
design.
And the principles are not new.
Governance beats heroics. That's not
new. Google knew that a long time ago,
but we have different ways to apply it
in the AI age. And people forget it
because they think that you can just
talk to the AI and make it do anything.
So it's worth repeating. It's worth
reiterating. So there you go. Those are
the 12 things I noticed. I hope you dig
in. I hope you enjoy the advanced guide.
If you're still here listening to these
12, you probably want the advanced
guide. It's likeund something pages.
It's it's really in-depth. I had fun. I
basically took everything that I learned
over the last six or eight months. And
people kept saying, "Nate, Nate, Nate,
Nate, I'm tired of looking at all these
guides. Can you just give me like one
thing? This is the one thing. This is
the one thing. This is the guide, right?
Like this is the soup to nuts a toz
complete guide for prompt engineering as
it exists today in September 2025.