If you’re an author, you may have recently discovered in The Atlantic that your published book was included in a dataset of books used to train artificial intelligence systems without your permission. (Search the dataset here.) This can be an unsettling revelation, raising concerns about copyright, compensation, and the future implications of AI. Here’s what you need to know if your work has been used to “train” AI without permission:
Books3 Is One of Several Books Datasets Used to Train AI Systems
The Books3 dataset contains 183,000 books, downloaded from pirate sources. We know that companies like Meta (creators of LLaMA), EleutherAI, and Bloomberg have used it to train their language models. OpenAI has not disclosed training information about GPT 3.5 or GPT 4—the models underlying ChatGPT—so we don’t know whether it also used Books3. Regardless of whether GPT was trained on Books3, the class action lawsuits against OpenAI should uncover more information on the datasets used by OpenAI, which we believe also include books obtained from pirate sources.
You Don’t Have to Be a Named Plaintiff in the Lawsuits to Benefit From the Outcome
In addition to the recent lawsuit in which the Authors Guild is a named plaintiff, there are other author class action suits pending against OpenAI, Meta, and Google. You don’t need to be a named plaintiff in any of these lawsuits to participate because the respective named plaintiffs represent their entire class. Even if you don’t fall within one or more classes, an outcome in favor of authors should benefit you by clarifying that books need to be licensed when used to “train” generative AI.
The Authors Guild Is Pursuing Protections for All Writers
Our lawyers at Lieff Cabraser and Cowan, DeBaets, Abrahams & Sheppard are not adding additional named plaintiffs to serve as class representatives to the lawsuit at this time. But since this is a class action case—assuming the class will be certified by the court—you are covered if you meet the class definition laid out in the complaint (PDF). For specific questions about the lawsuit, contact the lawyers here.
If you are not covered by the class in the ongoing suits, know that the Authors Guild is still pursuing protections and compensation for all writers, from poets to memoirists to biographers to translators. This lawsuit is only the first step.
Actions You Can Take Now
Litigation can take a long time, but there are other important actions you take to speak out in defense of your rights now:
- If your books are in the Books3 dataset, or if any AI system has intimate knowledge of them, you can send a letter to AI companies telling them that they do not have the right to use your books. We have created a form to make it easy for you to send this letter.
- Sign our open letter to the CEOs of AI companies demanding they compensate writers and get proper permission. More than 15,000 writers have signed to date. You can add your signature here.
- Take action to prevent future unauthorized use of your work in AI systems. Read more about how to do this here.
- Support the Authors Guild in our efforts to protect writers by becoming a member or making a donation. Your support helps us fight to protect writers’ copyrights against AI misuse and ensure that authors are entitled to control the use of their work and be compensated for it in the age of AI.
- Stay informed on the lawsuits and legislation that could impact you by signing up for our newsletter. The landscape is changing rapidly, and we share information about regulations that pertain to AI use of creative works.
Having your book used by AI can be discouraging, but don’t feel powerless. Take action to protect your rights, join forces with other authors, and push the industry toward a fairer system of transparency and compensation. With collective action, we can shape an AI future that respects authorship and protects the profession at large.