Learning Library

← Back to Library

Eval-Driven Development Powers Legal AI Acquisition

7m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

The past state of AI matters, as shown by Thomson Reuters’ 2023 acquisition of CaseText for $650 million—a decade‑old startup that successfully pivoted to LLM‑driven legal analysis.
CaseText’s value lay in eliminating hallucinations for lawyers, delivering provably accurate citations and arguments that meet the profession’s zero‑tolerance‑for‑error standards while easing heavy workloads.
The deal illustrates the high‑valuation, large‑scale potential of AI products that can guarantee precision in regulated fields such as law.
Their success stemmed from “evaluation‑driven development,” a rigorous, test‑like process of repeatedly measuring and refining specific LLM prompts and workflow steps until each component performed perfectly.
This eval‑centric approach, analogous to test‑driven development in software, is presented as a repeatable pattern for building trustworthy, scalable AI applications across industries.

Sections

00:00:00 Thomson Reuters’ $650M AI Acquisition - The speaker details Thomson Reuters’ 2023 purchase of legal‑tech startup CaseText for $650 million, emphasizing how its successful pivot to hallucination‑free LLM‑driven legal analysis demonstrates the groundwork needed for sky‑high AI valuations and large‑scale rollout.

Full Transcript

# Eval-Driven Development Powers Legal AI Acquisition **Source:** [https://www.youtube.com/watch?v=Rauz-3jycQ0](https://www.youtube.com/watch?v=Rauz-3jycQ0) **Duration:** 00:07:15 ## Summary - The past state of AI matters, as shown by Thomson Reuters’ 2023 acquisition of CaseText for $650 million—a decade‑old startup that successfully pivoted to LLM‑driven legal analysis. - CaseText’s value lay in eliminating hallucinations for lawyers, delivering provably accurate citations and arguments that meet the profession’s zero‑tolerance‑for‑error standards while easing heavy workloads. - The deal illustrates the high‑valuation, large‑scale potential of AI products that can guarantee precision in regulated fields such as law. - Their success stemmed from “evaluation‑driven development,” a rigorous, test‑like process of repeatedly measuring and refining specific LLM prompts and workflow steps until each component performed perfectly. - This eval‑centric approach, analogous to test‑driven development in software, is presented as a repeatable pattern for building trustworthy, scalable AI applications across industries. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Rauz-3jycQ0&t=0s) **Thomson Reuters’ $650M AI Acquisition** - The speaker details Thomson Reuters’ 2023 purchase of legal‑tech startup CaseText for $650 million, emphasizing how its successful pivot to hallucination‑free LLM‑driven legal analysis demonstrates the groundwork needed for sky‑high AI valuations and large‑scale rollout. ## Full Transcript

0:00we can be tempted to believe that AI 0:03only matters going forward that the past 0:05state of AI is not relevant the past 0:07state of tech isn't relevant but that's 0:09not true I want to tell you the story of 0:12an acquisition in AI that happened over 0:14a year ago now it was back in the summer 0:17of 0:172023 and only now are some of the 0:20details starting to leak as people start 0:22to feel like they're able to talk about 0:24it it's relevant because it illustrates 0:27what's required to get a really Sky High 0:30valuation and a truly rolled out 0:32application at scale and we haven't 0:34talked about it 0:35much this is the this is the company 0:38case text is a Thompson Reuters company 0:42Now Thompson Reuters are the news people 0:44but apparently they acquire startups as 0:46well and they acquired case text back in 0:50August of 2023 for 0:53$650 million at that time the company 0:56was about 10 years old they'd gone 0:58through a few rounds obviously if you're 1:00a 10-year-old company in 2023 you were 1:02not starting out in the llm space this 1:05was before large language models they 1:07had to Pivot it turns out that they 1:12pivoted very successfully into a llm 1:15driven legal analysis use case they had 1:20about 10,000 clients at the time of 1:22acquisition and what they had done was 1:25they had figured out in the summer of 1:282023 how to avoid hallucinations for 1:31lawyers you can see how that would be 1:33highly valuable the lawyer absolutely 1:35needs to know that they are not going to 1:37get in trouble with the judge with the 1:39bar association for citing cases that 1:42don't exist they need to make sure that 1:43their legal arguments are sound they 1:45have zero tolerance for error but they 1:48also have a tremendous workload and 1:50would love help with uh getting through 1:53that workload more efficiently if they 1:55can guarantee the accuracy so that's 1:57what case text set out to do 2:00and what's interesting is not that they 2:02accomplished it and they got their m&a 2:04and everybody walked away with money or 2:05whatever what's interesting is how they 2:08did it because that is a pattern that's 2:10applicable to other startups and other 2:13applied AI use cases and that's why I 2:15want to call it 2:16out so over the weekend a very very long 2:19read from Eugene Yan broke that talks 2:22about this idea of eval driven 2:25development um we've seen this for a 2:27long time in software it was called test 2:29driven devel M for a while or 2:31TTD and in llms evil driven development 2:34is this idea that you relentlessly 2:37rigorously evaluate the performance of a 2:41particular prompt the performance of a 2:43particular series of steps performed by 2:45an llm and then you feed that back in 2:48until you can get the llm to behave 2:51exactly the way you want 2:53to and so in this case what they did at 2:55case text was they took the act of doing 2:59a deposition or the act of doing any 3:01other task in the legal profession that 3:03they wanted to cover and they said you 3:06know what we are going to break this 3:08down and we're going to evaluate it very 3:11very very thoroughly until we can get 3:13the llm to perform each component of the 3:15task exactly right before we get it to 3:17do the overall 3:19task that is a tremendously helpful 3:22insight I'm going to link the whole uh 3:25read here under the YouTube video but 3:26it's absolutely critical to 3:29understanding how to build llm 3:30applications at scale essentially what 3:32they're saying is if attention is all 3:34that matters which was the famous 2017 3:37paper that described how uh some of the 3:39foundational Technologies of llms work 3:41how Transformers work how self attention 3:44Works um this is really saying that 3:47attention needs to be applied in detail 3:50at a micr step level to avoid 3:53hallucinations in fact they talk about 3:55more than 1,000 different evals for 3:59given task and that they did not pass a 4:04particular task at case text until every 4:07single eval 4:08pass which means if you had even one 4:12failure you would go back and break down 4:14that step and understand it better and 4:15that was that was the key they said that 4:18that uh fundamentally getting case text 4:21to a zero hallucination state required 4:24grinding through micr level detail with 4:26lawyers to understand exactly what they 4:28were doing so they could construct the 4:30llm extremely 4:32specifically and that was how they 4:34actually got it to zero level 4:35hallucination and that is what Justified 4:37the $650 million valuation from Thompson 4:40Reuters and so the generalizable pattern 4:44I take away is that if you are building 4:46an applied AI system it is possible that 4:50the hallucinations that you are 4:51experiencing are a function 4:54of you not understanding how to instruct 4:58the llm more more precisely how can you 5:01instruct it so precisely that it cannot 5:04mess up that it must clearly come back 5:08with exactly the correct 5:11response it's it's a worthy question 5:13right how much have you broken down your 5:16tasks if you are building AI systems how 5:18much have you structured them into 5:20extremely specific micro detail level 5:24steps I think one of the things that's 5:26going to distinguish strong AI 5:28applications is that they make that step 5:30breakdown something that's intuitive and 5:33easy for the end user because the end 5:35user is going to be too lazy you know I 5:38am going to be too lazy when I am 5:39running through and I have a meeting in 5:4110 minutes and I need an agenda to go 5:43through and break down into micro level 5:45steps every single thing and one of the 5:48developments that I've noticed in 2024 5:50is some of those common tasks are easier 5:52they're less prone to hallucination 5:53they're more precise thanks to backend 5:56work from companies like open AI or 5:58anthropic similarly if you are building 6:01a system where you are trying to make it 6:03easier for the end user you have to take 6:05on more of the specific fine tuning and 6:08prompting inside your system so that 6:10they can be fairly generic and they can 6:12invoke a very specific series of 6:14individual steps on the back end that 6:16lead to the kind of precise and accurate 6:19result they would want in a sense they 6:21want to treat the llm as if it already 6:25knows what they're talking about what 6:28their context is what they need and will 6:30then just generate a relevant result but 6:32that doesn't come by happen stance that 6:34doesn't come by accident that comes 6:37because you the system designer took the 6:40time to understand their intent break it 6:42down into specific granular components 6:46and instruct the llm how to handle that 6:48entire logical sequence and so I think 6:50what we got from the case study on case 6:52text pun intended is a great example of 6:57both the Venture value created by eval 6:58driven develop 7:00but also the importance of Designing 7:02based on evals to get exactly what you 7:05need out of an AI 7:08system hallucinations might just be an 7:11artifact thoughts