This week is Fair Use Week, an annual celebration of the legal doctrine of fair use, which plays an essential role in teaching, education, and scholarship. This year, we are looking at the development of fair use in Generative Artificial Intelligence.
What is Fair Use?
The fair use doctrine allows for the use of copyrighted works in certain circumstances, which is determined using a four-factor test that considers the purpose of the use, the nature of the copyrighted work, the amount and substantiality used, and the effect of the use on the market for the copyrighted work. Fair use is purposely vague to avoid unnecessarily limiting the use of copyrighted materials, but this vagueness could also result in uncertainty about whether a use is a fair use or an infringement until it is challenged in court. Nowhere is that vagueness and uncertainty more prevalent than in the current climate around fair use and artificial intelligence.
The Role of Fair Use in Generative Artificial Intelligence
As the growing number of lawsuits brought against AI companies indicates (see ChatGPT Is Eating the World), there is sentiment among many copyright owners that the inclusion of copyrighted works in datasets used to train AI tools without permission constitutes infringement, as do the outputs produced by AI tools that are copies of or significantly similar to the copyrighted works. AI companies rely, in part, on fair use to defend their use of copyrighted works. As is true with any fair use case, to determine the strength of a fair use argument, courts will balance the fair use factors to see if the use is favorable. Let’s explore how each factor might apply to AI.
Factor 1: The Purpose and Character of the Use
When considering the purpose of the use, which is the first fair use factor, the potential commerciality of the AI companies’ use gets weighed with their claim of transformative use. Any possibility of commercial benefit that AI companies stand to gain from using the copyrighted works will weigh against a finding of fair use. This has a significant impact on any AI tools that require a paid subscription to use. However, if companies can successfully argue that their use is transformative and adds value that is new and different than the original purpose of the copyrighted work, that will weigh in favor of fair use. The transformative use, according to AI companies, is that copyrighted works are being used as data to help AI models recognize patterns that will in turn help them generate new and unique content. A transformative use argument is also considered with the output generated by the AI tool. If the output is substantially similar to the original copyrighted work and both works share the same or highly similar purpose, the use may not be considered transformative.[1]
Factor Two: The Nature of the Copyrighted Work
The second fair use factor is the nature of the copyrighted work, which examines characteristics such as if the work is fact or fiction and is the work published or unpublished. The use of highly creative works like novels, song lyrics, etc.—which are often used to train AI tools—typically weighs against fair use.
Factor Three: The Amount and Substantiality of the Portion Used
The third factor evaluates the amount and substantiality of the copyrighted work used in relation to the copyrighted work as a whole. Typically, a larger portion of a copyrighted work used, or the use of the heart of a work, weighs against fair use. However, if the use of an entire work is appropriate to accomplish a favored use, such as a use that is transformative, it may not weigh against fair use. AI companies could argue that ingesting anything less than the entirety of copyrighted works would lessen the accuracy of their AI tools and hamper their ability to achieve their transformative use in training the tool.
Factor Four: Market Effect
Under the fourth fair use factor, courts consider if the use has an effect on the market for the copyrighted work. If the value of a copyrighted work is affected by it being used to train AI tools, that would weigh against fair use, as would any situation where the use served as a market substitute for the original copyrighted work. For example, some copyright owners take advantage of the potential to license their works for monetary gain. If an AI company chooses to avoid a readily available license and use the copyrighted work without permission, they would have a direct negative effect on the value of the work. Additionally, if a generated output is a copy of or substantially similar to the copyrighted work, it could act as a substitute for the copyrighted work, again directly affecting the market.
None of the fair use factors are determinative on their own—a use that is found to be transformative does not guarantee that a court will rule in favor of fair use. There may be other factors that weigh heavily in favor of the copyright owner that will cumulatively force a ruling against fair use. All of that to say, fair use cases greatly depend on the specific facts of each unique case, making it difficult to support any generalizations that you may hear about fair use and AI.
Current AI Lawsuits
As noted above, issues of copyright infringement and fair use are currently being litigated in court. Most recently, the district court in Delaware released a new summary judgement ruling in Thomson Reuters v. Ross Intelligence, rejecting a fair use defense in the use of copyrighted works for training of an AI legal search tool. In the case, Ross Intelligence trained their legal-research search engine using Bulk Memos, which consisted of compilations of legal questions and answers incorporating Westlaw headnotes (summaries of key points of law and case holdings).[2] In considering the fair use factors, the court held that Ross’s use was not transformative; Ross was using the headnotes as AI data to create a competing legal research tool. Additionally, the court found that Ross’s legal research tool served as a market substitute for Westlaw and also noted consideration for the effect of Ross’s use on a potential market for AI training data.
Two other major cases currently making their way through the courts that are addressing fair use in the training of AI tools is The New York Times Company v. Microsoft Corporation, involving use of New York Times articles in the training of OpenAI’s large language models, and Author’s Guild v. OpenAI, involving use of works from a class of professional fiction writers for training of OpenAI’s large language models.
We have written before about The New York Times v. Microsoft case; in their complaint, The New York Times have claimed that OpenAI has unlawfully used The Times’s works, including articles, in-depth investigations, opinion pieces, reviews, and how-to guides, to train the large language models that power CoPilot (previously Bing Chat) and ChatGPT. The New York Times states these AI tools “can generate output that recites Times content verbatim, closely summarizes it, and mimic its expressive style.”[3] According to Microsoft and OpenAI, large language models can be trained to recognize patterns in data, but reproduction of entire copyrighted works is not what the models and tools are designed to produce.[4]
OpenAI and Microsoft are also facing a lawsuit by the Author’s Guild. In their amended complaint filed on December 4, 2023, the Author’s Guild states that ChatGPT produces summaries of copyrighted text used in the training of the tool and the large language model underlying the tool, and that these summaries are themselves derivative works. The Author’s Guild also asserts that the plaintiff authors have suffered harm from the use of their copyrighted works, including lost opportunities to license their works and displacement of human-authored books.
Guidance from the United States Copyright Office
In 2023, the United States Copyright Office began examining the copyright law and policy issues raised by generative artificial intelligence in the scope of creating works and using copyrighted works in the training of AI. Their comprehensive initiative included public listening sessions, registration guidance for AI generated works, and publishing a Notice of Inquiry seeking public input on copyright issues raised by artificial intelligence. Their report, Copyright and Artificial Intelligence, analyzes copyright law and policy issues raised by artificial intelligence. The report will be issued in three parts.
Part 1 of the Copyright and Artificial Intelligence report was published on July 21, 2024 and addressed the topic of digital replicas. Part 2 of the report, published in January 2025, focuses on the copyrightability of outputs created using generative AI. The report states that existing principles of copyright law are flexible enough to apply to this new technology, as they have applied to technological innovations in the past. The report also concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts. The report confirms that the use of AI to assist in the process of creation or the inclusion of AI-generated material in a larger human-generated work does not bar copyrightability. It also finds that the case has not been made for changes to existing law to provide additional protection for AI-generated outputs.
Emerging Industry Solutions
As courts continue to work through these copyright issues and the U.S. Copyright Office completes their research and guidance, some have turned to licensing deals to facilitate AI training needs. Approaches have included opt-in models, such as the one offered by Cambridge University Press, that allow authors to opt-in to future licensing agreements with generative AI providers. Some opt-in models also offer payment to the author. The recent deal between Microsoft and HarperCollins, for example, allows authors to opt-in to the AI training program with a payment of $5,000 per title, with half of that amount going to the author. AI training datasets may also avoid copyright issues by limiting data to public domain works. In December of 2024, for example, Harvard announced the Institutional Data Initiative, with backing from Microsoft and OpenAI, that intends to share a dataset that includes 1 million public domain books.
What’s Next?
We await the US Copyright Office’s much anticipated third report on AI, which is set to explore “the legal implications of training AI models on copyrighted works” and hopefully provide practical guidance on the subject. Between that report and the many case rulings that may be forthcoming, hopefully the aforementioned vagueness and uncertainty will gradually transition to functional clarity on how to approach the intersection of fair use and artificial intelligence.
See the resources listed below for more information on fair use and artificial intelligence:
- Congressional Research Service, Generative Artificial Intelligence and Copyright Law (September 29, 2023). Available at: https://crsreports.congress.gov/product/pdf/LSB/LSB10922
- United States Copyright Office, Copyright and Artificial Intelligence, https://copyright.gov/ai/
- Knibbs, Kate, Every AI Copyright Lawsuit in the US, Visualized, Wired, https://www.wired.com/story/ai-copyright-case-tracker/ (last updated December 19, 2024).
[1] In Andy Warhol Foundation for the Visual Arts, Inc. v Goldsmith, the U.S. Supreme Court found that the Andy Warhol Foundation’s use of Goldsmith’s photograph of Prince shared “substantially the same purpose” as the original, and their “use is of a commercial nature,” affirming the Second Circuit Court of Appeals decision that the Foundation’s use did not qualify as fair use.
[2] The court holds that while the judicial opinions from which the headnotes are derived are not copyrightable, the headnotes “can introduce creativity by distilling, synthesizing, or explaining part of an opinion, and thus be copyrightable.” Thompson Reuters Enterprise Centre GMBH and West Publishing Corp., v Ross Intelligence Inc., Case No. 1:20-cv-613-SB (D.D.C. 2025), 7, https://www.ded.uscourts.gov/sites/ded/files/opinions/20-613_5.pdf
[3] The New York Times Company v. Microsoft Corporation, et al., Case No. 1:23-cv-11195, United States District Court, Southern District of New York, https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf (Filed on Dec. 27, 2023).
[4] Allyn, Bobby. “’The New York Times’ Takes OpenAI to Court. ChatGPT’s Future Could Be on the Line.” NPR, 14 Jan. 2025, www.npr.org/2025/01/14/nx-s1-5258952/new-york-times-openai-microsoft.
By Allison Schultz (Instructional Designer & Library Liaison, Ohio State Online), Landen Stafford (Copyright Services Specialist, Copyright Services), and Maria Scheid (Head, Copyright Services)
Recent Comments