Several authors including comedian Sarah Silverman are suing OpenAI for using pirated copies of their books to train language models. This unauthorized use gives rise to several copyright infringement claims and also violates the DMCA, they argue. OpenAI disagrees and this week asked the California federal court to dismiss all claims but one.
Artificial intelligence has the potential to make our lives more efficient, entertaining, and productive. There are potential downsides as well.
From a copyright perspective, AI brings up some interesting questions. For example, can content created by an AI be copyrighted? And can an AI be trained on copyrighted works without limitation?
Authors Sue OpenAI
According to several authors, large language model training sets shouldn’t be permitted to use every piece of text they come across online. In their lawsuit filed in June, book authors Paul Tremblay and Mona Awad accused OpenAI of direct and vicarious copyright infringement, among other things.
Soon after, writer/comedian Sarah Silverman was joined by authors Christopher Golden and Richard Kadrey in an identical suit which also accused OpenAI of using books as training data. This happened without permission, using datasets that were sourced from pirate sites, the complaint alleged.
The complaints mention the controversial Books2 and Books3 datasets that are believed to be sourced from shadow libraries such as LibGen, Z-Library, Sci-Hub, and Bibliotik.
“The books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community..,” the authors wrote.
OpenAI Asks Court to Dismiss Claims
This week, OpenAI responded to these accusations with a request for the bulk of the claims to be dismissed. They include vicarious copyright infringement, DMCA violation, unfair competition, “negligence,” and unjust enrichment allegations.
“None of these causes of action states a viable claim for relief because none of the legal theories challenged here actually condemns the conduct alleged with respect to ChatGPT, the language models that power it, or the process used to create them,” OpenAI informed the court.
“It is important for these claims to be trimmed from the suit at the outset, so that these cases do not proceed to discovery and beyond with legally infirm theories of liability.”
The only claim that should be able to survive, for now, is direct copyright infringement, but OpenAI expects to defeat the claim at a later stage.
Fair Use
The authors’ copyright infringement claims are grounded in copyright law. OpenAI doesn’t dispute that copyright plays a role but notes that the complaints take a hard line, glossing over exemptions such as fair use.
“Those claims, however, misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
OpenAI notes that when the U.S. Constitution was drafted, its creators saw copyright law as a tool to promote the progress of science and the useful arts. In this case, AI is seen as useful progress, and its use of large amounts of copyrighted texts could be seen as ‘fair’.
“Numerous courts have applied the fair use doctrine to strike that balance, recognizing that the use of copyrighted materials by innovators in transformative ways does not violate copyright,” OpenAI writes.
Derivative?
The authors clearly have a different take. They argued that every output of OpenAI’s language models is a copyright infringing derivative work. These derivatives are generated without obtaining permission from rightsholders.
OpenAI argues that this conclusion goes too far. The organization points out, based on the authors’ theory, that all output from large language models is essentially copyright infringing. While that may be what the authors want, it would severely hamper AI innovations.
Courts have previously rejected interpretations of the term derivative that are too broad, and should do so here as well, the AI company notes.
“According to the Complaints, every single ChatGPT output —from a simple response to a question, to the name of the President of the United States, to a paragraph describing the plot, themes, and significance of Homer’s The Iliad— is necessarily an infringing ‘derivative work’ of Plaintiffs’ books.
“Worse still, each of those outputs would simultaneously be an infringing derivative of each of the millions of other individual works contained in the training corpus— regardless of whether there are any similarities between the output and the training works. That is not how copyright law works,” OpenAI adds.
Based on these and a variety of arguments, OpenAI asks the court to dismiss all claims, except direct copyright infringement.
The authors have yet to respond, but they will likely counter OpenAI’s motion. These cases will help to define the boundaries of copyright when it comes to AI developments, and will likely be fought tooth and nail.
—
The motions to dismiss in the Tremblay and Awad case can be found here (pdf), and the identical version that’s filed in the Silverman, Golden, Kadrey lawsuit is available here (pdf).
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.