Learning Library

← Back to Library

Beyond Prompting: Probabilistic Context Engineering

Key Points

  • Context engineering expands prompt engineering by emphasizing that LLMs consider system instructions, chat rules, uploaded documents, and other surrounding information, all of which must be curated for the desired outcome.
  • Current discourse largely concentrates on the “deterministic” side of context—static prompts, knowledge bases, and token‑saving techniques like chain‑of‑draft shorthand that make the model’s reasoning more efficient.
  • The speaker highlights a neglected “probabilistic” dimension: any external, non‑deterministic data sources (e.g., web access, dynamic tools) that the model may draw on and that substantially influence its answers.
  • As LLMs increasingly integrate broader data feeds, engineers must shift focus from solely optimizing deterministic context to also managing and understanding these probabilistic, often uncontrolled, influences.

Full Transcript

# Beyond Prompting: Probabilistic Context Engineering **Source:** [https://www.youtube.com/watch?v=mldfMWbnZTg](https://www.youtube.com/watch?v=mldfMWbnZTg) **Duration:** 00:12:32 ## Summary - Context engineering expands prompt engineering by emphasizing that LLMs consider system instructions, chat rules, uploaded documents, and other surrounding information, all of which must be curated for the desired outcome. - Current discourse largely concentrates on the “deterministic” side of context—static prompts, knowledge bases, and token‑saving techniques like chain‑of‑draft shorthand that make the model’s reasoning more efficient. - The speaker highlights a neglected “probabilistic” dimension: any external, non‑deterministic data sources (e.g., web access, dynamic tools) that the model may draw on and that substantially influence its answers. - As LLMs increasingly integrate broader data feeds, engineers must shift focus from solely optimizing deterministic context to also managing and understanding these probabilistic, often uncontrolled, influences. ## Sections - [00:00:00](https://www.youtube.com/watch?v=mldfMWbnZTg&t=0s) **Rethinking Context Engineering Practices** - The speaker argues that current discussions of context engineering focus too narrowly on prompt‑level token efficiency, overlooking the broader responsibility of managing system instructions, uploaded documents, and overall context to ensure correct model behavior. - [00:03:10](https://www.youtube.com/watch?v=mldfMWbnZTg&t=190s) **Probabilistic Context vs Deterministic Prompt** - The speaker explains that connecting LLMs to extensive web data makes the probabilistic context overwhelm the user’s deterministic prompt, turning the prompt itself into a probabilistic guide and shifting the responsibility for maintaining focus onto prompt design. - [00:06:30](https://www.youtube.com/watch?v=mldfMWbnZTg&t=390s) **Auditing LLM Source Reliability** - The speaker discusses challenges in verifying the reliability of information sources used by LLM agents and warns that insufficient source control and security can lead to injection attacks. - [00:09:34](https://www.youtube.com/watch?v=mldfMWbnZTg&t=574s) **Versioning Prompts and Context Quality** - The speaker emphasizes the need to version and test prompts, prioritize source quality within large probabilistic context windows, and design evaluations that account for both security concerns and deterministic prompt shaping. ## Full Transcript
0:00I'd like to suggest that we aren't 0:02talking clearly enough about context 0:04engineering and that we're getting it 0:06wrong in some important ways. 0:10If you don't know what context 0:11engineering is, it's kind of the 0:13successor to the idea of prompt 0:14engineering or prompting. Context 0:17engineering basically says prompts are 0:19great, but large language models look at 0:21a lot more than prompts. They look at uh 0:23the system instructions they get. They 0:25look at any rules that you have in your 0:27chat instance. They look at documents 0:29that you may have uploaded and the 0:32responsibility of the person who is 0:35running this job is to make sure all of 0:38that context is correct and leads to the 0:41right outcome. So far so good. Same 0:44page. The issue is this. Most of the 0:48dialogue, most of the discussion I've 0:50been able to find around context 0:52engineering is really focused on what I 0:55would call part one or the smaller part 0:57of context engineering. The things that 0:59we can deterministically control. So we 1:03have papers written, we have advice 1:05shared, all coming down to how can you 1:08more effectively 1:10shrink down and make efficient use of 1:13the context window you directly send to 1:17the large language model is this 1:19assumption that we are communicating 1:21with a cloud-based model. We need to be 1:23really aware of our token burn. And so 1:25you have things like the uh famous paper 1:28on chain of draft where the idea is that 1:30you can get the LLM to save a bunch of 1:32tokens if you remind it that it can 1:35approximate logical thinking by writing 1:37its own symbols and shorthand instead of 1:41full tokenbased uh English to write out 1:44chain of thought. This turns out to save 1:46lots of tokens and be almost as good 1:48because it's really the act of writing 1:50things down for the LLM that helps it to 1:52think clearly. Uh, and I realize I'm 1:54using some anthropomorphizing metaphors, 1:56but but you get it. Uh, the act of 1:59writing down the symbols seems to work 2:01to prompt logical trains of thought for 2:02the LLM in a similar way to us humans 2:06writing things down and being able to 2:08remember as we go. 2:10All of that is part one, deterministic 2:13context. So, static prompts, knowledge 2:16bases, documentation, data feeds, all 2:20things we can control. 2:22That's the smaller part and we don't 2:24talk about the larger part. The larger 2:27part is probabilistic context. So what 2:30I'm saying is you have only a small 2:34piece of the overall context that the 2:37LLM uses to get you an answer if you 2:41have any kind of web access in your call 2:43at all. Now to be fair, sometimes you 2:46have no web access and you want it that 2:48way. Sometimes you are just sending very 2:52very structured calls, no web access, no 2:55external tools and you just want the LLM 2:58to generate a response. In that case, it 3:00makes sense to microcontrol part one so 3:04that you get extremely efficient 3:05responses. 3:07But I find that especially as LLMs push 3:10you to connect them to broader data 3:15sources so they get smarter, people are 3:18more and more assuming that they want an 3:20LLM that has access to the web. They 3:22want an LLM that has access to what I 3:24would call non-deterministic or 3:26probabilistic context. 3:29And when that happens, 3:32the number of tokens of context is so 3:36much greater it's hard to count. Your 3:38deterministic context becomes a drop in 3:41the bucket compared to how much 3:43probabilistic context context the model 3:46can acquire. So for example, if I tell 3:50uh a multi- aent system like Claude 3:52Opus, hey go and research this topic and 3:56then I give it a word document that has 3:58my perspective and I say go research 3:59this. I I kid you not 400, 500, 600 4:04websites later, it comes back. 4:07There is no way that my document and my 4:10prompt are any remotely measurable 4:14percentage of the total number of tokens 4:16it just processed. The only way that it 4:19still maintains a kind of focus 4:22is because it has been clearly 4:25reinforcement learned and trained to 4:28focus on the user's ask, which is fine. 4:32But all that does is transfer the 4:34responsibility for shaping the model's 4:37choice of probabilistic context to the 4:40prompt itself. And the prompt is 4:42therefore not deterministic. The prompt 4:44itself is probabilistic. Now we are 4:47shaping the 4:50context that the agent will go and grab 4:53by prompting and we can't control it but 4:56we can shape it and so the question 4:59becomes well how do we start to shape 5:01that well and how do we start to craft 5:03an environment that enables the AI to 5:07understand what we mean I think that is 5:10actually where context engineering needs 5:13to go. I think token optimization 5:15methods are legitimate. They clearly 5:18work well, but they kind of focus on 5:21cost cutting when I would like to see 5:23how we can get more correct answers and 5:25more useful and congruent answers. 5:28And so to me, especially as we look at a 5:31world with web access with MCP 5:33everywhere with increasing autonomy for 5:35agents net net, I know it's not perfect. 5:38I know we are in some ways a long way 5:41from a fully autonomous agent. We still 5:44want to be in a place where 5:46we understand the impact of our prompt 5:49on the overall probabilistic context. So 5:53here's my set of principles for you as 5:55you think about this. Number one, you 5:57need to expect discovery. So design for 6:00semantic highways. Think about it as 6:03what is the rate at which a desired 6:07response comes back when you include 6:10probabilistic context. 6:12Can you consistently prompt so that you 6:15get a response that you are happy with 6:18even though the context window is not 6:20tightly closed and the agent can go and 6:23search for things across MCP servers on 6:25the web etc. 6:28Number two, 6:30can you reliably monitor the quality of 6:34information sources that it's using? And 6:37can you track how those information 6:40sources are changing over time? So for 6:42example, if you tell it to use reliable 6:45and verified news uh sites to sort of 6:48find out the news on a particular topic 6:51and you audit the sources, would you 6:54agree that those are reliable and 6:55verified news sites or do you find that 6:58it's not actually doing that well? Even 7:00if you're happy with the result, you 7:01find the sources are not really great, 7:04which by the way happens an alarming 7:06amount of the time. Like I look at uh 7:08Chad GPT's deep research. I am often 7:11happy with the output but I am not often 7:15happy with the way it reached it. The 7:17sources seem quite sketchy at times. 7:22Maybe that's an incidence um or maybe 7:25that's an artifact of the reality that 7:27it's testing so many different sources 7:29and it's difficult for me to audit all 7:31600 or whatever that it's using. Or 7:34maybe it actually needs to be somewhat 7:36more constrained and we need to do more 7:39work on prompting to constrain source uh 7:42reliance with these agents even if only 7:44partially. 7:46Okay. Other principles that I think are 7:48helpful. 7:50Number three, 7:52you really need to take uh security 7:55seriously with probabilistic context. 7:59There will absolutely be people who 8:01figure out LLM injection attacks from 8:05agents doing searches across the web and 8:08MCP servers. It's going to happen. It 8:10will happen this year. Uh and I'm kind 8:13of surprised it hasn't happened already. 8:14In fact, it may have and I may have 8:16missed it. If you know of a case where 8:17someone used an MCP server and uh there 8:21was some sort of prompt injection attack 8:23on an LLM, I would be curious to see it. 8:26Regardless, we should anticipate that. 8:30Principle number four, I want to suggest 8:32that it's important to measure overall 8:39decision accuracy and it is 8:43probably more informative of the 8:46decision accuracy you reach from the 8:48reports that you generate with these 8:49methods. 8:51if you are relevant scoring the inputs 8:55and this gets back to source control but 8:57now you're adding sort of a relevant 8:59scoring piece to the extent you can like 9:01and maybe you have to do this with um a 9:05actual eval harness but to the extent 9:07you can I feel like deploying relevant 9:11scoring on the sources 9:13is going to be more predictive of the 9:15overall quality of the response for 9:17probabilistic context calls than just 9:22measuring traditional precision and 9:24recall because precision and recall 9:26implicitly assume a deterministic 9:28context window and you don't necessarily 9:30have that anymore. 9:34Uh number five is not that surprising. 9:36Uh you're you're going to have to 9:38version everything. You're going to have 9:39to test these prompts and version them 9:41carefully. And I think that's really 9:42really important. Um, 9:46so when you think about those together, 9:48like to me, they point the way toward 9:51a future where we are aware that there 9:53are security threats on the open web and 9:56across MCP servers in general. We 9:58understand that these larger context 10:01windows are probably beneficial to 10:02higher quality decisioning by LLMs, but 10:05we need to design our eval 10:08fundamentally around the idea that 10:11source 10:13source quality across this larger 10:17context window matters a great deal for 10:20quality of decision. The probabilistic 10:23context window, the one you can't fully 10:24control. 10:27And the thing that matters about what 10:28you can control, the deterministic 10:30context window, it's not really the 10:32tokens that you burned. It's not really 10:34the efficiency, although it doesn't hurt 10:36to make it more efficient. Train of 10:37draft is great. 10:39It's it's the ability to shape that 10:43probabilistic window with the way you 10:46prompt. And so I gave an example of sort 10:49of that's very simple that I've seen a 10:51lot of people do where they're saying go 10:52search verified news sites, right? 10:54People try and constrain the search 10:56space. Go search academic articles as 10:58another example. We're not really like 11:02evaling those in most circumstances. 11:04Most of the evals I see are around sort 11:08of the precision, recall, quality of 11:10answer for specific utterances. Often 11:13they're in customer success spaces where 11:15it's a very deterministic space. I think 11:17Eval's harnesses need to evolve and grow 11:21to handle a world where deterministic 11:23context is just a small part of context 11:25engineering. And a lot of context 11:27engineering involves thinking about how 11:29to shape a gentic search of the open web 11:33or potentially in large companies a very 11:35large internal data structure. How do 11:37you shape sort of use the same 11:38principles to shape how you search a 11:40very large internal data structure as an 11:42agent. 11:44So I hope that was sufficiently nerdy 11:46for you. I think we don't talk enough 11:48about context engineering. It's critical 11:51that we understand it better because 11:53remember the fundamental shift for us 11:56for from chat bots is they are no longer 11:58just large language models. They're 11:59really agents in a trench code for most 12:01of the frontline chat bots, most of the 12:03frontline API experiences. They are 12:05using guidance tools, scope on the back 12:09end, agentified behavior on the back end 12:12to successfully deliver results to you. 12:15We should probably have context 12:17engineering catch up with that agentic 12:19future and actually think about how we 12:22can uh deliberately engineer context 12:25when we can't control all the pieces. 12:28And I think that's a really interesting 12:29question. Cheers.