Learning Library

← Back to Library

Judge Rules AI Training Fair Use

6m • Unknown Channel • ai-ml • deep-dive • advanced • Watch on YouTube ↗

Key Points

Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources.
The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers.
Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability.
In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization.
Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims.

Sections

Full Transcript

# Judge Rules AI Training Fair Use **Source:** [https://www.youtube.com/watch?v=8beAhtbnM4Y](https://www.youtube.com/watch?v=8beAhtbnM4Y) **Duration:** 00:06:04 ## Summary - Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources. - The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers. - Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability. - In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization. - Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=0s) **AI Training Fair Use Ruling** - Judge William Alup ruled that using copyrighted books to train AI can be considered fair use but condemns acquiring those works from pirated sources, establishing a key precedent for how AI companies must obtain training data. - [00:03:18](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=198s) **Court Ruling Signals Pay for AI Scraping** - The judge’s finding that Anthropic could have purchased the books earlier underscores that AI companies must financially compensate authors for using their works, offering a tentative victory for creators amid broader, unsettled litigation. ## Full Transcript

0:00We got a bit of a road map today for the 0:02future of copyright cases in AI, which 0:05is something that I've been following 0:07really closely. I want to give you an 0:09outline of the ruling and then a look at 0:10where we stand on the legal challenges 0:13to AI right now. So, first this R 0:15ruling, it was by Judge William Alup and 0:18it was handed down in the case of Barts 0:20versus Anthropic. It validates AI 0:23training as fair use, but it condemns 0:26the piracy that enables it. And I want 0:28to spend a little bit of time here 0:29because Judge Alup was very precise and 0:32careful in the ruling. It's not as clear 0:34as saying this is a win for anthropic 0:36and AI companies because it enables fair 0:39use in AI. I would say if you want to 0:42think about how to frame this, it's sort 0:43of a Solomon's choice. It splits the 0:46baby. Yes, training Claude on millions 0:49of books does constitute fair use. But 0:52critically, Anthropic's choice to 0:55download those same books from pirate 0:57sites, which it did for earlier versions 1:00of Claude does not get a free pass. That 1:03distinction matters because it 1:04fundamentally shapes how AI companies 1:07must think about data acquisition going 1:09forward. So, the judge's reasoning, and 1:11this is really key to me, he does 1:13describe AI training as quintessentially 1:16transformative. Everyone reads texts and 1:19then writes new texts. also writes to 1:22make anyone pay specifically and I'm 1:24reading from his judgment here to make 1:26anyone pay specifically for the use of a 1:28book each time they read it each time 1:31they recall it from memory each time 1:33they later draw upon it when writing new 1:35things in new ways would be unthinkable 1:38I think Judge Alip gets that part right 1:40I think that is exactly what I've been 1:41worried that judges will not see and I 1:44find that incredibly encouraging that 1:46Judge Alip understands that AI is a 1:48transformative technology a transforms 1:50the text it's trained on. And this forms 1:53a conceptual foundation that I think 1:55other AI companies will be able to use 1:57when talking about how they do their 1:59work. Now, this is where the anthropic 2:02story gets interesting because after 2:04building their initial models on pirated 2:07content from Library Genesis and other 2:09places, dubious sources, the company 2:12made a deliberate shift in 2024. They 2:14hired Tim Tom Tom sorry Tom Turvy the 2:18former head of Google's book scanning 2:20project and Tom's mandate was to legally 2:24obtain all the books in the world. Can 2:26you imagine? I got to say as someone 2:28with a library that is my dream job. 2:30Tom, if you ever get tired of your job 2:32at Anthropic, please let me know. I 2:34would love to have the job of getting 2:35all the books in the world. Anyway, 2:37Anthropic spent millions of dollars, a 2:39significant percentage of the share of 2:41its total training costs for the new 2:44Sonnet and Opus models, purchasing 2:46physical books, many of them secondhand, 2:49which they then proceeded to slice from 2:51their bindings and scan into digital 2:53format. Yes, the physical books were 2:55destroyed, but the digital copies were 2:57ruled as legitimate fair use because 2:59Anthropic acquired the books 3:01legitimately. So the pivot for anthropic 3:05from using piracy to purchasing reveals 3:08a critical principle that I think other 3:11judges are likely to follow. AI 3:13companies can afford to do this right. 3:16Not all of them choose to do so. And the 3:18court does note the financial capability 3:21because if you can afford to purchase 3:23later then you could have purchased 3:25earlier. And judge also writes that 3:27using purchased books later quote will 3:30not absolve it of liability for the 3:32theft but may affect the extent of 3:34statutory damages. The judge saw that 3:36Anthropic had the money all along. So 3:38what does this mean for authors? I know 3:40authors in my life. In fact, arguably I 3:42am an author on Substack, right? Uh and 3:45I know that AI reads my stuff. Anyway, I 3:48think this ruling offers a glimmer of 3:49hope. Fundamentally, part of what 3:51authors have needed is some sense that 3:54companies cannot just scrape and steal 3:57work. There needs to be a sense of being 3:59willing to pay the going rate for the 4:01work in order to use it. So even if AI 4:04constitutes fair use, and authors may or 4:06may not agree that that's legitimate. 4:08That's fine. Everyone can sort of have 4:10different opinions and it's not 4:11certainly settled yet just with one 4:13ruling. It's still a step forward for 4:15authors that the court expects AI 4:17companies to pay for the work. It 4:20establishes something that is closer to 4:22a sustainable equilibrium. Companies 4:24must pay for access, support the 4:26creative economy and authors can benefit 4:28from AI tools if they choose to do so. 4:31Now, there are other open lawsuits out 4:33there, right? Multiple lawsuits against 4:35open AI. There's Kadre versus Meta, the 4:38lawsuit for Training Llama on Books 3, 4:41which is a data set that Anthropic also 4:43used that's pirated that now has an 4:45interesting precedent. And then there's 4:47visual AI companies that may use ALUB's 4:50transformative use reasoning to argue 4:52that image generation models are really 4:53the same thing as the text side and its 4:55fair use there. So the question for me 4:58is where do we go from here? Will we see 5:00courts adopt ALP's framework going 5:02forward? Will we see other sort of 5:06precedents and standards of judgment 5:08emerge? I think one of the things that 5:09I'm aware of is, you know, ALIP writes 5:11in the Northern District of California. 5:13It's not the whole country. We're seeing 5:15circuit splits on related AI issues. For 5:17example, the Ninth Circuit has used an 5:20actual knowledge requirement for 5:21contributory infringement and the second 5:23circuit has used a reason to know. 5:26That's a big difference when we're 5:27talking about platforms that host AI 5:29tools and it affects the sort of extent 5:31of liability that AI platforms will have 5:33in situations like this. Long and the 5:35short of it is this is a step forward in 5:37terms of providing legal clarity. I 5:38really appreciate Judge Alup's 5:40willingness to talk about the 5:42transformative value of AI and not just 5:44call it copying. I think that's a 5:46correct interpretation of what AI does. 5:48I think expecting AI companies to pay 5:50for what they train on is completely 5:53reasonable and we'll have to see where 5:55the story goes from here. Still, a 5:57little bit of clarity from the judiciary 5:58is a step forward. We'll take the win 6:00for today, won't we? All right. Cheers, 6:02guys.