Note: On January 1, 2026, a new batch of works entered the public domain in the United States—those that were published or registered in 1930. These works, if they met all required copyright formalities, received the maximum term of protection of 95 years. They now join countless other works already in the public domain in the United States and are free of copyright. This means they may be freely copied, adapted, distributed, performed, and displayed without permission from a rightsholder.
Artificial Intelligence and the Public Domain
Another Public Domain Day has come and gone, and the public domain in the United States continues to expand and become an even more valuable and essential part of the copyright lifecycle that makes creative works available to be freely used and inspire new works. Given its place of importance in the world of intellectual property and the creative cycle, one might wonder how the public domain and the works that comprise it can interact with the most influential technology of today, artificial intelligence. Generally speaking, the relationship between AI and the public domain is reciprocal, where the public domain provides works that serve as training data for AI models, which in turn generate new works that are often in the public domain. Expanding the nuances for both parts of this transaction can make the relationship clearer.
Generative AI Outputs and the Public Domain
To receive copyright protection in the United States, a work must contain human authorship, among other requirements. The outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. If a human adds creative elements to the AI-created output, such as arrangements or modifications, then they may be able to claim copyright in their contributions. The provision of prompts, however, does not provide the requisite human authorship for copyrightability because the subsequent output is considered the expression of the artificial intelligence, not the user. Therefore, strictly AI-generated works—those that are created solely by a machine without sufficient human intervention—are not eligible for copyright protection and are in the public domain in the United States. In January 2025, the U.S. Copyright Office released Part 2 of their report on Copyright and Artificial Intelligence, which further explores the copyrightability of outputs created by generative AI.
Training AI Using Public Domain Works
The public domain is composed of a large amount of high-quality content, including some of history’s most prominent works of literature, musical compositions, works of the U.S. Federal Government, obscure artwork, and even recent works that have been dedicated to the public domain by the author. Taking advantage of this vast source of content as training data for AI models is desirable not only because of the shear amount of content available but because the diversity of the materials is equally as expansive, both of which are crucial for high-performing, accurate AI tools. Also advantageous is that public domain works are free of copyright, meaning they can be used to train AI models without negotiating a license, paying royalties, relying on fair use, or risking claims of copyright infringement (see Chat GPT is eating the world for a list of current litigation involving AI companies).
An obstacle that may inhibit the use of public domain materials is that many of the works exist only in physical form in libraries and archives. Digital versions are necessary to be useful in the context of AI training. Institutions such as the Harvard Law School Library have recognized this obstacle, and they have responded by compiling the Institutional Book Corpus, a collection of almost one million public domain books that have been digitized and made available for anyone to use as data to train AI models. The corpus, released through their Institutional Data Initiative, contains books in 379 unique languages covering a variety of topics including language and literature, law, philosophy, psychology, religion, science, social science, political science, agriculture, and medicine. By stewarding these public domain books in such a proactive way, institutions like the Harvard Law School Library are able to increase access to high-quality content that can be used for the ethical development of AI technology, which is crucial as demands for transparency around the data on which AI models are trained are louder than ever.
Interested in learning more about the public domain? Explore the Public Domain Day website to learn more about the Public Domain Project at The Ohio State University and to view additional copyright and public domain resources.
By Landen Stafford (Copyright Services Specialist at Copyright Services, The Ohio State University Libraries)



Recent Comments