Prompt Engineering and Retrieval-Augmented Generation
Key Points
- Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs).
- It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data.
- One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers.
- The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know.
- An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training.
Full Transcript
# Prompt Engineering and Retrieval-Augmented Generation **Source:** [https://www.youtube.com/watch?v=1c9iyoVIwDs](https://www.youtube.com/watch?v=1c9iyoVIwDs) **Duration:** 00:12:43 ## Summary - Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs). - It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data. - One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers. - The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know. - An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training. ## Sections - [00:00:00](https://www.youtube.com/watch?v=1c9iyoVIwDs&t=0s) **Prompt Engineering Overview & RAG** - The speakers introduce the surge in prompt‑engineer roles, define prompt engineering as crafting effective queries to avoid LLM hallucinations, outline common LLM applications, and preview retrieval‑augmented generation as a key strategy. ## Full Transcript
so suj have you looked in your LinkedIn
profile lately and noticed there are a
ton of job openings for prompt Engineers
absolutely and that's why today we're
going to do a deep dive on what that is
and but first to give a little context
let's talk about what large language
models are used to do for a review of
course everyone is familiar with chat
Bots and that's seen that all the time
it's also used for well using summaries
for example another common use case or
information retrieve
those are three different cases but for
our viewers could you explain how that
applies in prompt engineering sure
prompt engineering is very vital in
communicating effectively with large
language models what does it mean it is
designing coming up with the proper
questions to get the responses you
looking for from the large language
model because you want to avoid
hallucination right hallucinations are
where you get essentially false results
out of a large language models and
that's because because the uh large
language models are predominantly
trained on the internet data and there
could be conflicting data conflicting
information and so on great okay I got
that so we're going to look at this from
four different approaches so let's get
straight to it yep we're going to look
at the first approach which is rag or
retrieval augmented generation we've had
videos about this already on the channel
so I have kind of a basic understanding
of it where you take domain specific
knowledge and add it to your model but
how does that actually work behind the
scenes could you explain that to me
absolutely so the larger language models
as you know are trained on the internet
data they are not aware of your domain
specific knowledge base content at all
so when you are quering the large
language models you want to bring
awareness of your knowledge base to the
large language model So when you say
knowledge base here you're referring to
something that might be specific to my
industry specific to my company which
I'm going to then be applied to the
model absolutely and so as that work
again so to make this uh bring this
awareness to the large language models
we have to have two components one is
the retriever component which brings the
context of your domain knowledge base to
the generator part of the large language
model and when they work together and
when you ask questions to the large
language model it is now responding to
your questions based on the domain
specificity of your content okay I think
I got it now this retriever that could
be really as simple as a database search
right exactly it can be a dector
database okay I got that um but could
you first kind of give me a quick
example of how you seen that applied in
an industry absolutely let's take the
example of a financial uh information
for a company right if you were to
directly asking a question through the
large language model about the total
earnings of a company for a specific
year it's going to go through its
learning and the internet data and come
up with a number that may not be
accurate right uh so for example the
annual earnings it could come back with
$19.5 billion and which may be totally
incorrect whereas if you want to get the
accurate responses then you bring the
attention to the domain knowledge base
and ask the same question then the large
language model is going to refer to your
knowledge base to bring that answer and
this time it will be accurate say for
example $5.4 billion I see because this
is a trust and source that it can then
integrate in with this larger model
correct okay so now we're on to the
second approach to prompt engineering C
or Chain of Thought and I I sometimes
think of this as the old saying explain
it to me like I'm an eight-year-old but
could you give me more a practical
explanation what that really means
absolutely I think uh the large language
models like in 8-year-old also need
guidance on how to arrive at those
responses right and before I jump to the
um Chain of Thought approach I want to
um recommend something right anytime you
are working with the large language
models consider two things the number
one is the rag approach content
grounding content ground your uh large
language model right and then take the
approach of promting it guiding the
model through the prompts to get the
responses that you need and cart belongs
in that category as well these other
three absolutely so let's talk about
Chain of Thought right Chain of Thought
is all about taking a bigger task of
arriving at a response breaking it down
into multiple sections and then
combining the results of all those
multiple sections and coming up with the
final
answer so instead of asking a large
language model what is the um total
earnings of a company in 2022 which it
will give you just a BL blur of a number
like $5.4 million you can actually as a
large language model give me the uh
total earnings of a company in 2022 for
software for hardware and uh for uh
Consulting say for example I see so
you're asking to be more precise with
the idea that you'll be able to get
individual results that will ultimately
combine combine it I see so for example
you cited we'll just make up some
numbers if I had five
and then continue the rest of and three
for example and the final answer will be
5 + 2 + 3 that will be the output but
the large language model is now arriving
at this number uh through reasoning and
through
explainability the was these was three
separate queries essentially three
separate problems so the way I tell the
large language model is I give the
problem and I explain it on how I will
break down the problem so for example I
say what what is the uh total earnings
of a company and the if the total
earnings of a company for software is
five for Hardware it is two for
Consulting it is three then the total
earnings is 5 plus 2 plus 3 let me see
if I can net that out to make sure I got
it so in rag we were talking about being
able to essentially improve based on
domain knowledge but then to improve on
the results that that generates we then
apply this technique the explain it to
an 8-year-old technique which then makes
the result even better mhm okay that was
Chain of Thought which as I understand
is a few shot prompt technique where you
basically provide some examples to
improve the end result and I think the
react is kind of the same genre but it's
a little bit different could you explain
to me the difference absolutely so react
is also a few short pting technique U
but it is different than the Chain of
Thought in Chain of Thought you are
going breaking down the steps of
arriving at the response right so you
were reasoning through the steps and
arriving at the response whereas react
is goes one step further it not only re
reasoning with that but acting based off
of what else is necessary to arrive at
the response so this data though is
coming from different sources we weren't
talking about that in the latter case
with k f and they are so they are for
example you have a situation where you
have your content the the domain content
in your private database knowledge base
right but you are asking a promp where
you question is demanding responses that
are not already available in your
knowledge base then the react approach
has the ability to actually go into a
private a public knowledge base and
gather both the information and arrive
at the response so the action part of
the react is its ability to go to the
external resources to gain additional
information to rual responses I got it I
got it but there's one thing that's
confused me just a teeny bit is that in
rag that looks awfully similar but
they're not the same where's the
difference here so the difference is uh
they both are using the private uh
databases right knowledge basis but in
large language models I want you to
think about two steps right one is
content grounding that's what rag is
doing it is making you large language
model aware of your main content where
react is different is it has the ability
to go to the public resources public
content and knowled knowledge basis to
bring additional information to complete
the task okay uh before we wrap can you
give me an example of react absolutely
so let's go back to the financial
example you were looking at in the
previous uh patterns we were looking at
the total earnings of a company for a
specific year now supposing you come
back with a prompt where you were ask
asking for the total learnings of 2010
and
2022 right your 2022 is information is
here in your private database knowledge
base but 2010 information is not there
for example it's over here in the public
one exactly so the large language model
in the react approach now take takes the
extern takes to the external resources
to get that information for 2010 and
then brings both of them and does the
observation I see so that's going to
produce a result that takes into
consideration this whereas before it
might have produced essentially a
hallucination hallucination and a couple
of more things right the react gives you
the results in a three-step process
right when you are asking the prompts in
a react mode you have to first of all
split that prompt into three steps one
is the thought right what are you
looking for and the second one is action
what are you getting uh from where right
and the third one finally is the
observation that is the summary of the
action that is taking place so for
example thought one will be retrieve the
total earnings for
2022 right and the thought so action one
will be it will actually go to the
knowledge base to retrieve 2022 and
observation will be 2022 value now
thought two is ret the value for 2010
from a an external knowledge base and
have that value there and observation
two will have that value and the part
three will be comparing them to arrive
at which is a better Total Learning for
you I think I've got it that's great we
only have one more to go if you really
want to impress your colleagues you want
to learn about this next one which is
directional stimulus prompting or DSP
different from the other ones and how so
DSP is a fun way uh and a brand new one
that I want to introduce to the audience
of uh uh making the large language
models give specific information giving
it a direction to give specific
information from the task so for example
you ask a question and say for example
what is
the um annual earnings of a company uh
but then you want don't want the final
number but you want specific details
about learnings for say software or for
Consulting so you give a hint and say
software and Consulting and the large
language model first of all we'll get
the earnings and then from that extract
specific values for software and
Consulting this kind of reminds me of
the game where you're trying to get
someone to draw a picture and what do
you do you provide a hint and in effect
this provides you a better result in the
same fashion absolutely so it is a very
simple technique but it works very very
well when you are looking for specific
values from the task so try it out well
thanks Su I Now understand what DSP is
but could you kind of net out how do you
combine these different techniques um
you you should always start with rag to
bring Focus to your domain content but
you can also combine cot and react you
can also combine Rag and DSP to get that
cumulative effort uh effect excellent
okay well thank you very much I hope you
come back for another episode in promptu
absolutely thank you
Dan thank you for watching before you
leave please click subscribe and
like