Meta’s Alleged Unlawful Use of Copyrighted Books for AI Training Sparks Controversy

Meta Platforms is facing legal challenges amid accusations of using several copyrighted books to train its AI models, despite cautions from its legal advisors. The controversy, outlined in a recent court submission related to a copyright infringement lawsuit, reveals an ongoing dispute between prominent authors and the technology giant.

Comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, among others, have joined forces against Meta, alleging that the company illicitly used their creative works to train its artificial intelligence language model, Llama. The recent legal filing consolidates these assertions, underscoring Meta’s alleged disregard for obtaining proper copyright permissions in its pursuit of advancing AI technology.

The legal submission presents chat logs from a Meta-affiliated researcher discussing the acquisition of the dataset in a Discord server. These logs indicate Meta’s awareness of potential legal infringements related to using book files.

According to a Reuters report, the conversation in the complaint reveals a dialogue between researcher Tim Dettmers and Meta’s legal department, expressing concerns about the legality of using the book files for training purposes. Dettmers’ communications disclose internal debates within Meta regarding the permissibility of using the dataset, shedding light on the company’s apparent acknowledgment of legal uncertainties surrounding the matter.

While the details of the lawyers’ concerns remain undisclosed, references to “books with active copyrights” have surfaced as a primary source of apprehension. Participants in the conversation suggest that training on such data could potentially violate fair use, a legal doctrine that protects specific unlicensed uses of copyrighted works.

The release of Meta’s Llama large language model, purportedly trained on the contentious dataset, sparked controversy within the content creator community. With tech companies grappling with a barrage of lawsuits alleging unauthorized use of copyrighted material to advance AI technologies, the outcomes of these legal battles could significantly shape the future landscape of generative AI.

In February, Meta introduced the initial version of its Llama large language model and a list of datasets used during its training phase. This included incorporating “the Books3 section of ThePile,” a dataset reportedly comprising 196,640 books, as confirmed by claims in the legal filing. However, Meta opted not to disclose the specifics of the training data used for its latest iteration, Llama 2, which became commercially available during the summer. It is accessible for enterprises with fewer than 700 million monthly active users without charge.


Team Eela

TechEela, the Bedrock of MarTech and Innovation, is a Digital Media Publication Website. We see a lot around us that needs to be told, shared, and experienced, and that is exactly what we offer to you as shots. As we like to say, “Here’s to everything you ever thought you knew. To everything, you never thought you knew”

Leave a Reply

Your email address will not be published. Required fields are marked *