Scarinci Hollenbeck, LLC

201-896-4100 info@sh-law.com

What Issues Arise When AI Uses Copyrighted Works?

Author: Albert J. Soler|September 11, 2023

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly...

What Issues Arise When AI Uses Copyrighted Works?

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly...

What Issues Arise When AI Uses Copyrighted Works?

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly...

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly.  One of the key issues and intricacies involves content produced by “generative AI” computer programs (discussed below), whether the content is entitled to copyright protection, and how training and using these programs may infringe existing copyrights.

Stand-up comedian Sarah Silverman is one of many content creators who have filed lawsuits alleging that AI platforms were trained on their copyrighted works without authorization or license from the rights holders.  Silverman, along with authors Christopher Golden and Richard Kadrey, contend that defendants OpenAI and Meta Platforms copied the authors’ published books to train their AI products ChatGPT and LLaMA “without consent, without credit, and without compensation.”

How Generative AI Works

OpenAI and Meta Platforms both offer AI software products known as large language models (LLM). Rather than being programmed by software engineers, large language models are “trained” by copying massive amounts of text and extracting expressive information from such text. As the U.S. Patent and Trademark Office (USPTO) has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”

Once properly “trained,” platforms like ChatGPT and LLaMA allow users to enter text prompts. The AI platforms then attempt to respond with a coherent and fluent response that closely mimics human language. To produce text outputs, LLMs rely on information extracted from their training datasets, along with patterns and connections drawn from the data. For example, if an LLM is prompted to generate a writing in the style of a certain author, the LLM would construct and generate content based on patterns and connections it learned from analysis of that author’s work within its training data. Importantly, a user can also ask ChatGPT or LLaMA to summarize a copyrighted book and the programs do so based on the training data acquired by the program.  

Copyright Infringement Lawsuits Against AI Platforms

In the lawsuits, Plaintiffs Silverman, Golden, and Kadrey maintain that they did not consent to the use of their copyrighted books as training material for ChatGPT or LLaMA. They further allege that the LLMs are themselves infringing derivative works, made without the plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.

According to their complaint, ChatGPT provided accurate summaries of the plaintiffs’ books when prompted, which demonstrates that the program was trained using their copyrighted works.  “Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” their complaint against OpenAI states. The suit further alleges that “at no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works.”

Both suits were filed in California district court and seek class-action status. They allege claims of copyright infringement and violations of the section 1202(b) of the Digital Millennium Copyright Act (DMCA), as well as common law claims of unjust enrichment, unfair competition, and negligence. For example, the lawsuit against Meta argues that the company “breached its duties by negligently, carelessly, and recklessly collecting, maintaining and controlling [theirs] and [others’] infringed works and engineering, designing, maintaining and controlling systems – including LLaMA – which are trained on [theirs] and [others’] infringed Works without their authorization.”

While OpenAI and Meta Platforms have not yet officially responded to the lawsuits, the AI platforms will likely raise a fair use defense. As discussed in prior articles, fair use is determined on case-by-case basis and requires evaluation of the following four factors:

  • The purpose and character of the use (including whether it is transformative, commercial, non-profit, or educational);
  • The nature of the copyrighted work;
  • The amount and substantiality of the portion to be used; and
  • The effect upon the potential market for the copyrighted work.

In a recent report, the Congressional Research Service noted that AI companies have previously argued that their training processes constitute fair use and are therefore non-infringing, writing:

Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.

Of course, fair use analysis requires courts to weigh all four fair use factors, and the plaintiffs will likely contend several factors tip the scale in their favor. For example, they may argue that ChatGPT and LLaMA are commercial products, which weighs against fair use under the first statutory factor. They may also argue that by providing summaries of the books, the programs undermine the market for the original works, weighing against fair use under the fourth factor.

Key Takeaway

Artificial intelligence, particularly generative AI, raises novel and complex copyright issues.  In addition to the question of whether generative AI programs infringe copyrights in existing works, the availability of copyright protection for AI-generated works also remains unsettled. Because cases involving generative AI are in their infancy, we are unlikely to find answers to many of these copyright issues in the short term. In the meantime, this area of copyright law warrants close monitoring by content owners as well as AI platform creators and users and Scarinci Hollenbeck remains at the forefront of this issue. 

If you have questions, please contact us

If you have any questions or if you would like to discuss the matter further, please contact me, Albert J. Soler, or the Scarinci Hollenbeck attorney with whom you work, at 201-896-4100.

What Issues Arise When AI Uses Copyrighted Works?

Author: Albert J. Soler
What Issues Arise When AI Uses Copyrighted Works?

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly...

Questions surrounding artificial intelligence (AI) and copyright are evolving quickly.  One of the key issues and intricacies involves content produced by “generative AI” computer programs (discussed below), whether the content is entitled to copyright protection, and how training and using these programs may infringe existing copyrights.

Stand-up comedian Sarah Silverman is one of many content creators who have filed lawsuits alleging that AI platforms were trained on their copyrighted works without authorization or license from the rights holders.  Silverman, along with authors Christopher Golden and Richard Kadrey, contend that defendants OpenAI and Meta Platforms copied the authors’ published books to train their AI products ChatGPT and LLaMA “without consent, without credit, and without compensation.”

How Generative AI Works

OpenAI and Meta Platforms both offer AI software products known as large language models (LLM). Rather than being programmed by software engineers, large language models are “trained” by copying massive amounts of text and extracting expressive information from such text. As the U.S. Patent and Trademark Office (USPTO) has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”

Once properly “trained,” platforms like ChatGPT and LLaMA allow users to enter text prompts. The AI platforms then attempt to respond with a coherent and fluent response that closely mimics human language. To produce text outputs, LLMs rely on information extracted from their training datasets, along with patterns and connections drawn from the data. For example, if an LLM is prompted to generate a writing in the style of a certain author, the LLM would construct and generate content based on patterns and connections it learned from analysis of that author’s work within its training data. Importantly, a user can also ask ChatGPT or LLaMA to summarize a copyrighted book and the programs do so based on the training data acquired by the program.  

Copyright Infringement Lawsuits Against AI Platforms

In the lawsuits, Plaintiffs Silverman, Golden, and Kadrey maintain that they did not consent to the use of their copyrighted books as training material for ChatGPT or LLaMA. They further allege that the LLMs are themselves infringing derivative works, made without the plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.

According to their complaint, ChatGPT provided accurate summaries of the plaintiffs’ books when prompted, which demonstrates that the program was trained using their copyrighted works.  “Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” their complaint against OpenAI states. The suit further alleges that “at no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works.”

Both suits were filed in California district court and seek class-action status. They allege claims of copyright infringement and violations of the section 1202(b) of the Digital Millennium Copyright Act (DMCA), as well as common law claims of unjust enrichment, unfair competition, and negligence. For example, the lawsuit against Meta argues that the company “breached its duties by negligently, carelessly, and recklessly collecting, maintaining and controlling [theirs] and [others’] infringed works and engineering, designing, maintaining and controlling systems – including LLaMA – which are trained on [theirs] and [others’] infringed Works without their authorization.”

While OpenAI and Meta Platforms have not yet officially responded to the lawsuits, the AI platforms will likely raise a fair use defense. As discussed in prior articles, fair use is determined on case-by-case basis and requires evaluation of the following four factors:

  • The purpose and character of the use (including whether it is transformative, commercial, non-profit, or educational);
  • The nature of the copyrighted work;
  • The amount and substantiality of the portion to be used; and
  • The effect upon the potential market for the copyrighted work.

In a recent report, the Congressional Research Service noted that AI companies have previously argued that their training processes constitute fair use and are therefore non-infringing, writing:

Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.

Of course, fair use analysis requires courts to weigh all four fair use factors, and the plaintiffs will likely contend several factors tip the scale in their favor. For example, they may argue that ChatGPT and LLaMA are commercial products, which weighs against fair use under the first statutory factor. They may also argue that by providing summaries of the books, the programs undermine the market for the original works, weighing against fair use under the fourth factor.

Key Takeaway

Artificial intelligence, particularly generative AI, raises novel and complex copyright issues.  In addition to the question of whether generative AI programs infringe copyrights in existing works, the availability of copyright protection for AI-generated works also remains unsettled. Because cases involving generative AI are in their infancy, we are unlikely to find answers to many of these copyright issues in the short term. In the meantime, this area of copyright law warrants close monitoring by content owners as well as AI platform creators and users and Scarinci Hollenbeck remains at the forefront of this issue. 

If you have questions, please contact us

If you have any questions or if you would like to discuss the matter further, please contact me, Albert J. Soler, or the Scarinci Hollenbeck attorney with whom you work, at 201-896-4100.

Firm News & Press Releases