Meta’s Camel Remembers Most of Harry Potter

Meta’s Llama model has been remembered Harry Potter and the Swordsman’s Stone A new study shows that so good it can be verbatim excerpted from 42% of books.
Researchers from Stanford, Cornell and West Virginia University analyzed Book 3 datasets from dozens of books, a collection of pirated books used to train Meta’s Llama model. Books3 is also the center of copyright infringement lawsuits against Meta. Kadrey v. Meta Platform, Inc, Inc. The authors of the study said their findings could have a significant impact on AI companies facing similar lawsuits.
According to the research paper, the Llama 3.1 model “remembered some books, e.g. Harry Potter and 1984Specifically, the study found that Llama 3.1 has remembered so much 42% of the first Harry Potter book that it can be reproduced at least 50% of the time in a verbatim excerpt. Overall, Llama 3.1 can be reproduced from 91% of the books, although not consistent.
The paper says, “The degree to which the Book 3 dataset is verbatim memory of books is more important than previously described.” But the researchers also found that “memory varies greatly between models and between books in each model and differs in different parts of a single book.” For example, the study estimates that Camel 3.1 remembers only 0.13% Sandman Slim Richard Kadrey is one of the leading plaintiffs in the class action lawsuit, a copyright lawsuit against Meta.
So while some papers find themselves outrageous, don’t call it a plaintiff’s smoking gun in AI copyright infringement cases.
Mixable light speed
“These results allow all AI copyrights to debate,” journalist Timothy B. Lee wrote in his understanding. “Such disagreement results may doubt whether it makes sense to put JK Rowling, Richard Kadrey together with thousands of other authors together in a massive lawsuit. This may help Meta’s support, as most authors lack the resources to file individual lawsuits.”
Why is Llama able to copy some books more than other books? James Grimmelmann said: “I suspect the difference is because Harry Potter is a more famous book. It’s widely cited and I’m sure that on third-party sites, its massive excerpts found the way they get into training data on the web.”
This also shows that “AI companies can make choices to increase or decrease memory. This is not an inevitable feature of AI; they have control over it.”
Meta and other AI companies believe that training their models with copyrighted works is protected by fair use, a complex legal doctrine. However, the degree of memory may complicate these arguments.
“Yes, I do think the possibility of LLM’s memory changes copyright analysis more than I thought of before,” Robert Brauneis, a professor at George Washington University’s Law School, said in an email. He concluded that the findings of the study could ultimately undermine the meta’s rational use argument.
We ask Meta to comment on the findings of the study and we will update this article if we receive a reply.
Disclosure: Mashable’s parent company Ziff Davis filed a lawsuit against Openai in April, accusing it of infringing on Ziff Davis’ copyright in training and operating its AI systems.
theme
Artificial Intelligence Meta