Judge Rules AI Training Fair Use
Key Points
- Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources.
- The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers.
- Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability.
- In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization.
- Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims.
Sections
- AI Training Fair Use Ruling - Judge William Alup ruled that using copyrighted books to train AI can be considered fair use but condemns acquiring those works from pirated sources, establishing a key precedent for how AI companies must obtain training data.
- Court Ruling Signals Pay for AI Scraping - The judge’s finding that Anthropic could have purchased the books earlier underscores that AI companies must financially compensate authors for using their works, offering a tentative victory for creators amid broader, unsettled litigation.
Full Transcript
# Judge Rules AI Training Fair Use **Source:** [https://www.youtube.com/watch?v=8beAhtbnM4Y](https://www.youtube.com/watch?v=8beAhtbnM4Y) **Duration:** 00:06:04 ## Summary - Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources. - The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers. - Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability. - In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization. - Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=0s) **AI Training Fair Use Ruling** - Judge William Alup ruled that using copyrighted books to train AI can be considered fair use but condemns acquiring those works from pirated sources, establishing a key precedent for how AI companies must obtain training data. - [00:03:18](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=198s) **Court Ruling Signals Pay for AI Scraping** - The judge’s finding that Anthropic could have purchased the books earlier underscores that AI companies must financially compensate authors for using their works, offering a tentative victory for creators amid broader, unsettled litigation. ## Full Transcript
We got a bit of a road map today for the
future of copyright cases in AI, which
is something that I've been following
really closely. I want to give you an
outline of the ruling and then a look at
where we stand on the legal challenges
to AI right now. So, first this R
ruling, it was by Judge William Alup and
it was handed down in the case of Barts
versus Anthropic. It validates AI
training as fair use, but it condemns
the piracy that enables it. And I want
to spend a little bit of time here
because Judge Alup was very precise and
careful in the ruling. It's not as clear
as saying this is a win for anthropic
and AI companies because it enables fair
use in AI. I would say if you want to
think about how to frame this, it's sort
of a Solomon's choice. It splits the
baby. Yes, training Claude on millions
of books does constitute fair use. But
critically, Anthropic's choice to
download those same books from pirate
sites, which it did for earlier versions
of Claude does not get a free pass. That
distinction matters because it
fundamentally shapes how AI companies
must think about data acquisition going
forward. So, the judge's reasoning, and
this is really key to me, he does
describe AI training as quintessentially
transformative. Everyone reads texts and
then writes new texts. also writes to
make anyone pay specifically and I'm
reading from his judgment here to make
anyone pay specifically for the use of a
book each time they read it each time
they recall it from memory each time
they later draw upon it when writing new
things in new ways would be unthinkable
I think Judge Alip gets that part right
I think that is exactly what I've been
worried that judges will not see and I
find that incredibly encouraging that
Judge Alip understands that AI is a
transformative technology a transforms
the text it's trained on. And this forms
a conceptual foundation that I think
other AI companies will be able to use
when talking about how they do their
work. Now, this is where the anthropic
story gets interesting because after
building their initial models on pirated
content from Library Genesis and other
places, dubious sources, the company
made a deliberate shift in 2024. They
hired Tim Tom Tom sorry Tom Turvy the
former head of Google's book scanning
project and Tom's mandate was to legally
obtain all the books in the world. Can
you imagine? I got to say as someone
with a library that is my dream job.
Tom, if you ever get tired of your job
at Anthropic, please let me know. I
would love to have the job of getting
all the books in the world. Anyway,
Anthropic spent millions of dollars, a
significant percentage of the share of
its total training costs for the new
Sonnet and Opus models, purchasing
physical books, many of them secondhand,
which they then proceeded to slice from
their bindings and scan into digital
format. Yes, the physical books were
destroyed, but the digital copies were
ruled as legitimate fair use because
Anthropic acquired the books
legitimately. So the pivot for anthropic
from using piracy to purchasing reveals
a critical principle that I think other
judges are likely to follow. AI
companies can afford to do this right.
Not all of them choose to do so. And the
court does note the financial capability
because if you can afford to purchase
later then you could have purchased
earlier. And judge also writes that
using purchased books later quote will
not absolve it of liability for the
theft but may affect the extent of
statutory damages. The judge saw that
Anthropic had the money all along. So
what does this mean for authors? I know
authors in my life. In fact, arguably I
am an author on Substack, right? Uh and
I know that AI reads my stuff. Anyway, I
think this ruling offers a glimmer of
hope. Fundamentally, part of what
authors have needed is some sense that
companies cannot just scrape and steal
work. There needs to be a sense of being
willing to pay the going rate for the
work in order to use it. So even if AI
constitutes fair use, and authors may or
may not agree that that's legitimate.
That's fine. Everyone can sort of have
different opinions and it's not
certainly settled yet just with one
ruling. It's still a step forward for
authors that the court expects AI
companies to pay for the work. It
establishes something that is closer to
a sustainable equilibrium. Companies
must pay for access, support the
creative economy and authors can benefit
from AI tools if they choose to do so.
Now, there are other open lawsuits out
there, right? Multiple lawsuits against
open AI. There's Kadre versus Meta, the
lawsuit for Training Llama on Books 3,
which is a data set that Anthropic also
used that's pirated that now has an
interesting precedent. And then there's
visual AI companies that may use ALUB's
transformative use reasoning to argue
that image generation models are really
the same thing as the text side and its
fair use there. So the question for me
is where do we go from here? Will we see
courts adopt ALP's framework going
forward? Will we see other sort of
precedents and standards of judgment
emerge? I think one of the things that
I'm aware of is, you know, ALIP writes
in the Northern District of California.
It's not the whole country. We're seeing
circuit splits on related AI issues. For
example, the Ninth Circuit has used an
actual knowledge requirement for
contributory infringement and the second
circuit has used a reason to know.
That's a big difference when we're
talking about platforms that host AI
tools and it affects the sort of extent
of liability that AI platforms will have
in situations like this. Long and the
short of it is this is a step forward in
terms of providing legal clarity. I
really appreciate Judge Alup's
willingness to talk about the
transformative value of AI and not just
call it copying. I think that's a
correct interpretation of what AI does.
I think expecting AI companies to pay
for what they train on is completely
reasonable and we'll have to see where
the story goes from here. Still, a
little bit of clarity from the judiciary
is a step forward. We'll take the win
for today, won't we? All right. Cheers,
guys.