Learning Library

← Back to Library

Prompt Engineering and Retrieval-Augmented Generation

12m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs).
It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data.
One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers.
The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know.
An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training.

Sections

00:00:00 Prompt Engineering Overview & RAG - The speakers introduce the surge in prompt‑engineer roles, define prompt engineering as crafting effective queries to avoid LLM hallucinations, outline common LLM applications, and preview retrieval‑augmented generation as a key strategy.

Full Transcript

# Prompt Engineering and Retrieval-Augmented Generation **Source:** [https://www.youtube.com/watch?v=1c9iyoVIwDs](https://www.youtube.com/watch?v=1c9iyoVIwDs) **Duration:** 00:12:43 ## Summary - Prompt engineering has become a hot job market, with many openings for specialists who craft effective queries for large language models (LLMs). - It involves designing precise prompts to guide LLMs and minimize “hallucinations,” where models generate inaccurate or false information due to conflicting training data. - One key strategy is Retrieval‑Augmented Generation (RAG), which couples a retriever that fetches domain‑specific knowledge with the LLM generator to produce context‑aware answers. - The retriever can be as simple as a database or vector search, allowing the model to incorporate proprietary or industry‑specific information it otherwise wouldn’t know. - An illustrative use case is in finance: using RAG, a model can accurately answer questions about a company’s earnings for a given year by pulling the relevant data from a corporate knowledge base instead of relying on its generic training. ## Sections - [00:00:00](https://www.youtube.com/watch?v=1c9iyoVIwDs&t=0s) **Prompt Engineering Overview & RAG** - The speakers introduce the surge in prompt‑engineer roles, define prompt engineering as crafting effective queries to avoid LLM hallucinations, outline common LLM applications, and preview retrieval‑augmented generation as a key strategy. ## Full Transcript

0:00so suj have you looked in your LinkedIn 0:02profile lately and noticed there are a 0:04ton of job openings for prompt Engineers 0:07absolutely and that's why today we're 0:09going to do a deep dive on what that is 0:12and but first to give a little context 0:14let's talk about what large language 0:16models are used to do for a review of 0:19course everyone is familiar with chat 0:20Bots and that's seen that all the time 0:23it's also used for well using summaries 0:26for example another common use case or 0:29information retrieve 0:30those are three different cases but for 0:32our viewers could you explain how that 0:35applies in prompt engineering sure 0:38prompt engineering is very vital in 0:40communicating effectively with large 0:42language models what does it mean it is 0:44designing coming up with the proper 0:47questions to get the responses you 0:49looking for from the large language 0:51model because you want to avoid 0:53hallucination right hallucinations are 0:55where you get essentially false results 0:58out of a large language models and 1:00that's because because the uh large 1:02language models are predominantly 1:04trained on the internet data and there 1:06could be conflicting data conflicting 1:08information and so on great okay I got 1:11that so we're going to look at this from 1:12four different approaches so let's get 1:14straight to it yep we're going to look 1:16at the first approach which is rag or 1:19retrieval augmented generation we've had 1:21videos about this already on the channel 1:23so I have kind of a basic understanding 1:24of it where you take domain specific 1:26knowledge and add it to your model but 1:28how does that actually work behind the 1:30scenes could you explain that to me 1:31absolutely so the larger language models 1:33as you know are trained on the internet 1:35data they are not aware of your domain 1:39specific knowledge base content at all 1:42so when you are quering the large 1:44language models you want to bring 1:46awareness of your knowledge base to the 1:49large language model So when you say 1:50knowledge base here you're referring to 1:52something that might be specific to my 1:54industry specific to my company which 1:55I'm going to then be applied to the 1:58model absolutely and so as that work 2:00again so to make this uh bring this 2:03awareness to the large language models 2:05we have to have two components one is 2:07the retriever component which brings the 2:09context of your domain knowledge base to 2:12the generator part of the large language 2:15model and when they work together and 2:18when you ask questions to the large 2:20language model it is now responding to 2:23your questions based on the domain 2:25specificity of your content okay I think 2:27I got it now this retriever that could 2:29be really as simple as a database search 2:31right exactly it can be a dector 2:33database okay I got that um but could 2:35you first kind of give me a quick 2:36example of how you seen that applied in 2:38an industry absolutely let's take the 2:41example of a financial uh information 2:43for a company right if you were to 2:46directly asking a question through the 2:48large language model about the total 2:50earnings of a company for a specific 2:52year it's going to go through its 2:54learning and the internet data and come 2:56up with a number that may not be 2:59accurate right uh so for example the 3:02annual earnings it could come back with 3:03$19.5 billion and which may be totally 3:07incorrect whereas if you want to get the 3:10accurate responses then you bring the 3:12attention to the domain knowledge base 3:15and ask the same question then the large 3:18language model is going to refer to your 3:20knowledge base to bring that answer and 3:22this time it will be accurate say for 3:24example $5.4 billion I see because this 3:27is a trust and source that it can then 3:30integrate in with this larger model 3:32correct okay so now we're on to the 3:34second approach to prompt engineering C 3:37or Chain of Thought and I I sometimes 3:40think of this as the old saying explain 3:43it to me like I'm an eight-year-old but 3:45could you give me more a practical 3:47explanation what that really means 3:49absolutely I think uh the large language 3:52models like in 8-year-old also need 3:55guidance on how to arrive at those 3:57responses right and before I jump to the 4:00um Chain of Thought approach I want to 4:04um recommend something right anytime you 4:06are working with the large language 4:08models consider two things the number 4:11one is the rag approach content 4:14grounding content ground your uh large 4:17language model right and then take the 4:20approach of promting it guiding the 4:24model through the prompts to get the 4:26responses that you need and cart belongs 4:28in that category as well these other 4:31three absolutely so let's talk about 4:33Chain of Thought right Chain of Thought 4:36is all about taking a bigger task of 4:40arriving at a response breaking it down 4:42into multiple sections and then 4:45combining the results of all those 4:46multiple sections and coming up with the 4:48final 4:49answer so instead of asking a large 4:53language model what is the um total 4:57earnings of a company in 2022 which it 4:59will give you just a BL blur of a number 5:02like $5.4 million you can actually as a 5:05large language model give me the uh 5:08total earnings of a company in 2022 for 5:12software for hardware and uh for uh 5:16Consulting say for example I see so 5:18you're asking to be more precise with 5:19the idea that you'll be able to get 5:21individual results that will ultimately 5:23combine combine it I see so for example 5:26you cited we'll just make up some 5:27numbers if I had five 5:30and then continue the rest of and three 5:32for example and the final answer will be 5:355 + 2 + 3 that will be the output but 5:38the large language model is now arriving 5:41at this number uh through reasoning and 5:45through 5:46explainability the was these was three 5:48separate queries essentially three 5:49separate problems so the way I tell the 5:51large language model is I give the 5:53problem and I explain it on how I will 5:56break down the problem so for example I 5:58say what what is the uh total earnings 6:01of a company and the if the total 6:03earnings of a company for software is 6:05five for Hardware it is two for 6:07Consulting it is three then the total 6:09earnings is 5 plus 2 plus 3 let me see 6:11if I can net that out to make sure I got 6:13it so in rag we were talking about being 6:16able to essentially improve based on 6:18domain knowledge but then to improve on 6:21the results that that generates we then 6:24apply this technique the explain it to 6:26an 8-year-old technique which then makes 6:28the result even better mhm okay that was 6:31Chain of Thought which as I understand 6:33is a few shot prompt technique where you 6:36basically provide some examples to 6:37improve the end result and I think the 6:40react is kind of the same genre but it's 6:42a little bit different could you explain 6:44to me the difference absolutely so react 6:46is also a few short pting technique U 6:50but it is different than the Chain of 6:51Thought in Chain of Thought you are 6:53going breaking down the steps of 6:55arriving at the response right so you 6:57were reasoning through the steps and 7:00arriving at the response whereas react 7:02is goes one step further it not only re 7:06reasoning with that but acting based off 7:09of what else is necessary to arrive at 7:12the response so this data though is 7:14coming from different sources we weren't 7:16talking about that in the latter case 7:17with k f and they are so they are for 7:21example you have a situation where you 7:24have your content the the domain content 7:27in your private database knowledge base 7:29right but you are asking a promp where 7:32you question is demanding responses that 7:35are not already available in your 7:37knowledge base then the react approach 7:40has the ability to actually go into a 7:44private a public knowledge base and 7:48gather both the information and arrive 7:49at the response so the action part of 7:52the react is its ability to go to the 7:55external resources to gain additional 7:58information to rual responses I got it I 8:02got it but there's one thing that's 8:03confused me just a teeny bit is that in 8:05rag that looks awfully similar but 8:08they're not the same where's the 8:09difference here so the difference is uh 8:12they both are using the private uh 8:14databases right knowledge basis but in 8:18large language models I want you to 8:20think about two steps right one is 8:22content grounding that's what rag is 8:25doing it is making you large language 8:27model aware of your main content where 8:31react is different is it has the ability 8:33to go to the public resources public 8:36content and knowled knowledge basis to 8:38bring additional information to complete 8:40the task okay uh before we wrap can you 8:42give me an example of react absolutely 8:46so let's go back to the financial 8:48example you were looking at in the 8:49previous uh patterns we were looking at 8:52the total earnings of a company for a 8:55specific year now supposing you come 8:57back with a prompt where you were ask 8:59asking for the total learnings of 2010 9:02and 9:032022 right your 2022 is information is 9:07here in your private database knowledge 9:10base but 2010 information is not there 9:13for example it's over here in the public 9:15one exactly so the large language model 9:19in the react approach now take takes the 9:24extern takes to the external resources 9:26to get that information for 2010 and 9:29then brings both of them and does the 9:31observation I see so that's going to 9:34produce a result that takes into 9:35consideration this whereas before it 9:37might have produced essentially a 9:38hallucination hallucination and a couple 9:41of more things right the react gives you 9:43the results in a three-step process 9:46right when you are asking the prompts in 9:48a react mode you have to first of all 9:51split that prompt into three steps one 9:54is the thought right what are you 9:56looking for and the second one is action 9:59what are you getting uh from where right 10:02and the third one finally is the 10:04observation that is the summary of the 10:06action that is taking place so for 10:09example thought one will be retrieve the 10:13total earnings for 10:142022 right and the thought so action one 10:18will be it will actually go to the 10:20knowledge base to retrieve 2022 and 10:23observation will be 2022 value now 10:26thought two is ret the value for 2010 10:30from a an external knowledge base and 10:33have that value there and observation 10:35two will have that value and the part 10:38three will be comparing them to arrive 10:41at which is a better Total Learning for 10:44you I think I've got it that's great we 10:46only have one more to go if you really 10:48want to impress your colleagues you want 10:50to learn about this next one which is 10:52directional stimulus prompting or DSP 10:55different from the other ones and how so 10:58DSP is a fun way uh and a brand new one 11:01that I want to introduce to the audience 11:03of uh uh making the large language 11:06models give specific information giving 11:09it a direction to give specific 11:11information from the task so for example 11:14you ask a question and say for example 11:18what is 11:19the um annual earnings of a company uh 11:24but then you want don't want the final 11:26number but you want specific details 11:28about learnings for say software or for 11:32Consulting so you give a hint and say 11:34software and Consulting and the large 11:37language model first of all we'll get 11:39the earnings and then from that extract 11:42specific values for software and 11:43Consulting this kind of reminds me of 11:45the game where you're trying to get 11:46someone to draw a picture and what do 11:48you do you provide a hint and in effect 11:51this provides you a better result in the 11:52same fashion absolutely so it is a very 11:54simple technique but it works very very 11:57well when you are looking for specific 11:59values from the task so try it out well 12:03thanks Su I Now understand what DSP is 12:05but could you kind of net out how do you 12:07combine these different techniques um 12:09you you should always start with rag to 12:12bring Focus to your domain content but 12:14you can also combine cot and react you 12:17can also combine Rag and DSP to get that 12:20cumulative effort uh effect excellent 12:23okay well thank you very much I hope you 12:25come back for another episode in promptu 12:28absolutely thank you 12:29Dan thank you for watching before you 12:32leave please click subscribe and 12:40like