Skip to main content

That’s Not Fair! Fair Use Defense Denied for AI Training Materials

A A A

Article

“A smart man knows when he is right; a wise man knows when he is wrong. Wisdom does not always find me, so I try to embrace it when it does—even if it comes late, as it did here.”

It’s not common for a federal judge to make such an admission, but that is exactly what Judge Bibas of the United States District Court for the District of Delaware did in his recent opinion in Thomson Reuters Enterprise Centre GMBH v. Ross Intelligence Inc., a first-of-its-kind decision addressing whether fair use is an available defense to a copyright infringement claim stemming from training of artificial intelligence (AI) platforms. In finding for Thomson Reuters on its motion for summary judgment of copyright infringement—after having previously denied its motion— Judge Bibas answered this question with a resounding “no,” at least with respect to use of copyrighted materials to train AI.

AI Training and the Thomson Reuters Copyright Dispute

Previously, we've covered how the machine learning models that power AI must undergo “training.” Training involves a process in which the model derives its behavior from operating on (or “studying”) a set of material and/or data referred to as training data. By being so “trained,” the AI can then generate responses to user queries.

It is this training process that led to the Thomson Reuters suit. The plaintiff, Thomson Reuters, owns one of the largest legal-research platforms, Westlaw, where users pay to access case law, state and federal statutes, regulations, law journals, treatises, and headnotes that summarize key points of law and case holdings. Thomson Reuters owns copyrights in Westlaw’s copyrightable materials, including the headnotes.

The defendant, Ross Intelligence, is a Westlaw competitor that made a legal-research search engine using AI. To train its AI tool, Ross utilized a database of legal questions and answers. It first approached Thomson Reuters for a license to Westlaw’s content. When Thomson Reuters declined, Ross instead contracted with LegalEase to receive training data in the form of “Bulk Memos,” consisting of a compilation of lawyers’ legal questions with good and bad answers. The problem was those “Bulk Memos” were created using Thomson Reuters’ copyrighted Westlaw headnotes. When Thomson Reuters found out what Ross had done, it sued for copyright infringement, alleging that Ross copied content from Westlaw, including headnotes. Ross responded by raising the fair use defense to its use of Thomson Reuters’ copyrighted materials to train its AI.

Fair Use: Four Key Factors

Courts examine four factors to assess whether an unauthorized use of copyrighted material was nevertheless permitted as fair use:

  1. The use’s purpose and character, including whether it is commercial or nonprofit;
  2. The copyrighted work’s nature;
  3. How much of the work was used and how substantial a part it was relative to the copyrighted work’s whole; and
  4. How the defendant’s use affected the copyrighted work’s value or potential market

Why the Court Rejected Ross’s Fair Use Defense

While Judge Bibas originally denied a motion for summary judgment filed by Thomson Reuters on the issues of copyright infringement and fair use, here he reversed course, finding Ross’s use of the Westlaw headnotes in its training materials was not a fair use of those materials. Judge Bibas distinguished Ross’s argument that intermediate copying has been permitted under fair use in analyzing computer programs, noting that previous cases were about copying computer code, not training machine learning systems. In those computer-programming cases, copying the code “was necessary for competitors to innovate” whereas here, “there is no computer code whose underlying ideas can be reached only by copying their expression.” In other words, Ross’s copying of the Westlaw headnotes was not reasonably necessary to achieve its new purpose. Noting that “[c]opyright encourages people to develop things that help society, like good legal-research tools,” Judge Bibas weighed all four fair use factors to ultimately reject Ross’s fair use defense, amongst other defenses such as innocent infringement, copyright misuse, and merger.

What the Thomson Reuters Decision Means for the Future of AI and Copyright

The decision in Thomson Reuters has been eagerly awaited by litigants in a slew of pending copyright infringement lawsuits targeting AI developers, including Open AI Inc. and Meta Platforms. But questions remain as to how, or if, the decision will be persuasive regarding a fair use defense in response to a claim of copyright infringement via content created by generative AI, rather than use of content in training AI. Whether the use is “transformative” is a key inquiry in assessing the purpose and character of the involved use. In Thomson Reuters, Judge Bibas was careful to note that Ross’s AI was not generative AI; it did not create new works but “[r]ather, when a user enters a legal question, Ross spits back relevant judicial opinions that have already been written.” Generative AI platforms, like ChatGPT or Microsoft CoPilot, don’t just spit back out what was put into them; they learn from those materials and create new materials based on their training. Is that process “transformative” and thus deserving of protection as fair use? That’s a question currently without an answer. 

However, as copyright infringement cases in the AI space continue to percolate in the federal courts, we can expect to have that answer one day. While currently pending cases are largely focused on infringement in the training process, AI is being used to generate images and text that bears a striking resemblance to the works on which the AI was trained. For example, buried deep in a pleading in a currently pending case brought by three visual artists against the owner/operator of Stable Diffusion, an AI image platform, is an example of an image allegedly copied in the training process and an image the platform generated, where the images look “visually similar.” The example is offered to demonstrate that copying of the original image must have occurred in the training process, but it could just as equally be offered to demonstrate copyright infringement in generative AI output:


Those lawsuits, when they come, will be rife with their own complications. As we've previously covered regarding Dr. Stephen Thaler’s challenge to the Copyright Office’s refusal to grant copyright to his AI-generated work, "A Recent Entrance to Paradise"—see our initial post on his challenge and our update on the Copyright Office's cross motion—it remains unclear who the author of a fully AI-created image would be, and that being so, the question becomes who is the defendant in a copyright case against AI output? We’ll continue to bring you updates in the AI and IP space as they develop.

This article is provided for informational purposes only—it does not constitute legal advice and does not create an attorney-client relationship between the firm and the reader. Readers should consult legal counsel before taking action relating to the subject matter of this article.

  Edit this post