AI Black Friday Deal Showdown
Key Points
- The experiment compared five AI tools—ChatGPT 5.1, Claude Opus 4.5, Gemini 3, and the Atlas and Comet smart browsers—to see which could locate the best Black Friday discount on a specific item (a gray sectional couch).
- Clear, detailed intent in the prompt is crucial; vague instructions caused Comet to miss the color requirement and led to generic or incorrect results from the browsers.
- Model‑based AIs (ChatGPT, Claude, Gemini) can combine web‑search results with their own reasoning, whereas the smart browsers rely more on the raw page content, leading to markedly different performance.
- In the test, Atlas failed to surface any specific product deals (returning only generic articles), illustrating that without precise prompting the browsers struggle to deliver useful Black Friday recommendations.
Sections
- AI Black Friday Deal Test - The speaker compares five AI tools (ChatGPT 5.1, Claude Opus 4.5, Gemini 3, Atlas, and Comet) in finding the best Black Friday deal on a sectional couch, emphasizing that clear, detailed prompts are essential for accurate results.
- Evaluating Comet's Product Recommendations - The speaker critiques Comet’s shopping suggestions, highlighting its ability to surface similar products with detailed info but noting a Walmart bias, lack of color specificity, and less‑than‑optimal price selections.
- Critique of LLM Product Recommendations - The speaker evaluates ChatGPT, Claude, and Gemini’s shopping suggestions, noting missing links, confusing category groupings, inconsistent rankings, and ultimately rating their overall usefulness.
- Comparing AIs for Deal Hunting - The speaker assesses several AI models—including GPT‑5.1, Claude, Gemini, and others—on their effectiveness at locating Black‑Friday and Cyber‑Monday deals, concluding that GPT‑5.1 outperforms the rest while each model shows distinct strengths and limitations.
Full Transcript
# AI Black Friday Deal Showdown **Source:** [https://www.youtube.com/watch?v=DcnTK7E1Ayc](https://www.youtube.com/watch?v=DcnTK7E1Ayc) **Duration:** 00:11:01 ## Summary - The experiment compared five AI tools—ChatGPT 5.1, Claude Opus 4.5, Gemini 3, and the Atlas and Comet smart browsers—to see which could locate the best Black Friday discount on a specific item (a gray sectional couch). - Clear, detailed intent in the prompt is crucial; vague instructions caused Comet to miss the color requirement and led to generic or incorrect results from the browsers. - Model‑based AIs (ChatGPT, Claude, Gemini) can combine web‑search results with their own reasoning, whereas the smart browsers rely more on the raw page content, leading to markedly different performance. - In the test, Atlas failed to surface any specific product deals (returning only generic articles), illustrating that without precise prompting the browsers struggle to deliver useful Black Friday recommendations. ## Sections - [00:00:00](https://www.youtube.com/watch?v=DcnTK7E1Ayc&t=0s) **AI Black Friday Deal Test** - The speaker compares five AI tools (ChatGPT 5.1, Claude Opus 4.5, Gemini 3, Atlas, and Comet) in finding the best Black Friday deal on a sectional couch, emphasizing that clear, detailed prompts are essential for accurate results. - [00:03:06](https://www.youtube.com/watch?v=DcnTK7E1Ayc&t=186s) **Evaluating Comet's Product Recommendations** - The speaker critiques Comet’s shopping suggestions, highlighting its ability to surface similar products with detailed info but noting a Walmart bias, lack of color specificity, and less‑than‑optimal price selections. - [00:06:32](https://www.youtube.com/watch?v=DcnTK7E1Ayc&t=392s) **Critique of LLM Product Recommendations** - The speaker evaluates ChatGPT, Claude, and Gemini’s shopping suggestions, noting missing links, confusing category groupings, inconsistent rankings, and ultimately rating their overall usefulness. - [00:10:03](https://www.youtube.com/watch?v=DcnTK7E1Ayc&t=603s) **Comparing AIs for Deal Hunting** - The speaker assesses several AI models—including GPT‑5.1, Claude, Gemini, and others—on their effectiveness at locating Black‑Friday and Cyber‑Monday deals, concluding that GPT‑5.1 outperforms the rest while each model shows distinct strengths and limitations. ## Full Transcript
I put five AIs to the test on Black
Friday and I want to share the results
right here with you. I'm going to pull
up the browsers and all of that in a
minute. You're going to see what we did.
First, what is the test? We are figuring
out which AI is able to help us find the
true best deal on Black Friday commonly
discounted items. I chose a sectional
couch for this, but you can absolutely
use this for TVs or anything else that
you're looking for. What are the
contestants? Chad GPT 5.1, Claude Opus
4.5, Gemini 3, and two smart browsers. I
went with Atlas and I went with Comet.
I am going to show you what each of them
did in a second. But first, I want to
give you the takeaways. What are the
overall learnings I had as I ran this
test? And this will help you regardless
of whether you're prompting for Black
Friday or anything else. So, number one,
the intent has to be very clear for the
model to come back and give you what you
want. I know that's not new, but it came
through. You'll see in Comet that Comet
did not figure out even though I was on
a gray sectional couch page that I
wanted a gray sectional couch. So, the
color didn't come through because I
didn't specify it. Anywhere you don't
have clear intent, the model is just
going to give you its best guess. That's
why I built a prompt that is actually
designed to let you specify as much as
you want, in as much detail as you want,
the thing you're looking for for Black
Friday, so the model can go and get it.
And you'll see what happens when you
start to get clearer and how much more
powerful the search gets. Overall, the
browsers have a very different approach
architecturally than the models do. The
models can use web search tools and then
their own inference or reasoning
abilities to figure out what is the
right answer to the prompt around a
Black Friday deal. The web search tools
are actually equipped with more
information. They are able to look at
the web page itself along with whatever
prompt you give. Now, typically when we
are browsing the web, we are using more
casual language. And so, I opted for a
split test where we have a commonly used
phrase for Black Friday deals in the
browsers along with all of that rich
detailed information on the page to feed
the model and a more detailed prompt in
Chad GPT, Claude, and Gemini. What you
see is what you get here, guys. Let's
jump right in. Our first contestant is
the Atlas browser. I just pulled up a
love seat convertible couch. It is a se
couch in the meaning of the word, but
it's a very sort of casual couch. I
wanted to see if the model would pick up
on other options. What I got was really
really disappointing. I got Black Friday
sectional so sofa deals in general. I
got crate and barrel. I got articles. I
did not get any specific product. Right?
If you look through, you will see that
this was a couch and they just could not
come up like the model just could not
come up with specific deals for me at
all. So, I grade that one as a failure.
I don't think I got any useful results.
And because of the way it wrote just
general articles, I'm confident that
whatever product I picked, I'm going to
get general articles about that deal.
It's not going to help me with the
economic activity of shopping. Let's see
what Comet had to say. Here's Comet. And
already we can see it's a much more
useful answer. it. Yes, I started with a
different one. It doesn't matter in this
case. It found specific products. That's
the key differentiator, right? It's
found products. And I will tell you, the
first one is a gray match. It's actually
a bed, though. So, it's finding products
that are within the same cognitive
family, but they're not exactly the
same. I love the generative interface
here where it shows you specific
options. It gives you a short text
description, it gives you a verified
link, and it tells you the retailer. One
thing you will notice is that this is a
very Walmartheavy response. And I wonder
if that's the prompting. There's another
Walmart product. There's a Sam's Club
product. Uh Wayfair. Wayfair does make
an appearance. You can see it's not
being color specific. If I'd asked for
gray, it probably would have actually
leaned into the gray side. I didn't
specify my intent. And so that's what I
get. Back to Walmart. One of the other
things you will notice is that this is
not super price optimized. So this is an
$839 deal. This is a $1,200 couch. Uh
and it's claiming this is around $7 to
$900. and bringing me a $1,200 couch.
That's a relevant price difference. And
so, as much as the presentation is good,
I found the actual substantive results
not super great. Like, if I gave Atlas a
failing grade, I would give Comet about
a C for this. Let's see what we get when
we move to other models. Okay, here's
Chat GPT. I have a much more complex
prompt here. If you're wondering, by the
way, is it a fair test? Is a more
complex prompt in a browser going to
work better? You will see the results of
that at the end of this video. Yes, I
will be sharing this prompt on the
substack. I'm actually going to be
sharing multiple variants because
running this has made me realize that
you can skew this prompt to be more of a
dealhound or a deal hunter. You can skew
it to be more around the preferences.
You want the details right, the gray
couch, etc. So, I'll do a few variants,
but overall, the product results are
really clean. It gives you a visual. It
shows you already deals that are better.
I was not seeing deals this good. I
wasn't seeing consistent deals this
good. It has a really odd choice for a
Black Friday deal. This is a very fancy
$3,000 couch. I'm not quite sure why it
chose to include that in the summary. Uh
it is able to calculate value and so I'm
able to get I guess five couches under
600 bucks, four of which are under 500.
So overall that the value density is
higher on chat GPT. Now I still have to
hunt around for the text um and sort of
click on that to get what I want, but
it's pretty good. It gives me a nice
table. It gives me the percentage off.
if I find that very handy. It does uh
claim to give me the link over here, but
it doesn't actually, which I think is a
mess. Let's see what Claude had to say.
Okay, here's Claude running the same
prompt. By the way, this prompt is quite
detailed. I opted to just write
sectional couch right here and tell it
to basically assume average for
everything else because I'm lazy. I love
that you can be lazy with prompts. Uh,
it goes through a complete search. It
comes through with a budget tier, which
is really widely ranging right from 270
bucks to 989. I'm not sure why those are
in the same category. I think that's
probably incorrect. If you look at the
links, they appear to be just links to
the overall website, not to the Pacific
product, which I think is a mess. Uh, it
gives me mid-range tier, uh, which it
defines as anything$1 to $2,000, but
then this is over $2,000. It doesn't
give me features to help me understand.
So, I think this is a mess. Uh, it does
give me a ranking of overall best value,
but that doesn't really make sense. like
the Albany Park one shows up again. Uh
it shows up again at a different price.
I think I think this was over $3,000
when I was looking at chat GPT.
And I'm glad it found the Wayfair George
Oliver modular. That's fine. I I I think
overall this is less useful than the
chat GPT answer. I I feel like the lack
of specific links really really hurts
and the categories and the way it
grouped it doesn't really make sense.
So, so far if you're looking at the
LLMs, I would say chat GPT maybe a BB B
minus. And I think we're looking at like
a C++ maybe for Claude. It's there's
some usefulness. There's some good
value, but like it feels like it's on
the bar with Comet, but with less
graphics, except maybe a more in-depth
search. So, that kind of makes up for
it. What did Gemini do? Gemini dove
right in and gave you all of its
thoughts, came back with an overall
price point, very detailed, but there
were no links. and it gives me top
picks, but it doesn't give me a way to
get to it. It appears to lean into that
George Oliver modular again. It comes
back with the Hanbeay. You remember the
Hanbeay was over on Comet. Um, it gives
me Deal Hunter advice that's pretty
generic. I would say this is in some
ways even worse than Claude because it
has fewer choices. It's less clear what
the categories mean to it. And there are
no links whatsoever, just like Claude,
but there's just even less choice. So,
if Claude was like trying to get to a
C++ and if we had like a C-grade uh
because of the sort of weird issues with
pricing that Comet had, this might
actually be a C minus. And so, right
now, when I'm looking overall at where I
would trust for getting a reliable rate
on the best deals possible, I'm looking
at Chad GPT 5.1 as a really good deal
hunter. And it really was. It's a
head-to-head test. But, we have one more
to test. what happens with comet and a
really detailed prompt. Okay, so I
pasted in my super detailed prompt. I
did exactly the same thing where I left
a bunch of it unfilled out and used
sectional couch and uh average price
sensitivity, which is what I did with
Chad GPT that got me good responses. Uh,
and it comes back with a much more
detailed answer here. It tells me what
it checked. It comes back with deals. It
has a whole table of them. It does not
have a visual picker, which I kind of
miss. Uh what's weird is if you look
through these deals, I actually don't
see that modular coming back. Uh and so
it feels like it went and found a bunch
of deals, but it wasn't necessarily able
to categorize them as well or
efficiently as Chad GPT was. Um it
thinks that the five seat sleeper is the
best deal now. Uh it's in the sort of
it's in the market, but it's not as sort
of runaway steel deal as that $270
couch. Overall, what I am learning from
this is that the shorter prompt, as
ironic as it may be, I think the shorter
prompt is probably more effective when
you are working with an agentic browser.
And the longer prompt, the more detailed
prompt I showed you, is going to be more
effective when you're working with, say,
a chat GPT 5.1 and you want to get
absolutely the best deal. And so, I'm
always in the market for delivering
value. My goal here is going to be to
craft a series of really good Black
Friday prompts that you can use to
optimize for a deal, optimize for a
specific item you're looking for. Um,
and I'll pop those into the into the
Substack. But I felt like this was
really useful because one of the things
OpenAI has done is they talk about GDP,
right? This idea that AI is supposed to
do economic activity that's useful.
Well, everybody knows that Americans go
shopping on Black Friday and Cyber
Monday and everything else and when they
do, they look for deals. That is an
economic activity. This is really the
first year when I could plausibly put
five AIs up, including two browsers and
and the three LLMs, and I could see what
they were good at or not. And I would
say really there is a difference. Ched
GPT 5.1 was the most useful. And the
others were okay. I would say the the
probably the second best overall was
Comet in terms of pleasantness of use.
It just wasn't as accurate or complete
as Chad GPT. I think Claude did okay,
but not probably better than the others.
And Gemini didn't do super well. It
highlights to me how different these
models are because you'll recall that
I've done videos saying Opus 4.5 is
really good at longrunning agentic
tasks. Gemini 3 is really good at
synthesizing complex documents. Well,
these are different activities. This is
an economic activity. This is finding
deals. It turns out properly prompted
that GPT 5.1 is the king at finding
deals. All right, you can get the
prompts. I'll run subsack.