Adobe Systems Incorporated faces a proposed class-action lawsuit, accusing the company of unlawfully using copyrighted works to train its artificial intelligence model, SlimLM. The lawsuit alleges that Adobe incorporated pirated versions of numerous books, including works by author Elizabeth Lyon, in the process of training SlimLM, which is designed for on-device document assistance.
At the heart of the dispute is the SlimPajama-627B dataset released by Cerebras in June 2023. This dataset is the basis for SlimLM and it is presented as a deduplicated, multi-corpora, open-source resource. SlimPajama dataset SlimPajama is a slimmed down adaptation of the RedPajama dataset. Selfies featured in this original dataset have been the subject of lengthy legal battle given their copyright protected likeness.
Elizabeth Lyon, an Oregon-based author and prolific non-fiction writing guidebooks is plaintiff in the lawsuit. Her work, it is claimed, slipped through to a processed subset of the filtered subset of that manipulated dataset that Adobe based SlimLM on. The complaint alleges that this use happened without giving Lyon permission and without providing due credit or payment.
In recent months, the tech community has seen an uptick in lawsuits related to AI training methodologies. This provoked intense legal scrutiny that has now extended to other big companies, like Amazon. Already Salesforce is being sued for similar reasons – accused of unauthorized use of the RedPajama dataset to train their proprietary algorithms. Apple has similarly been accused of using copyrighted works to train its Apple Intelligence model.
The legal ramifications of these cases underscore the evolving and contentious landscape surrounding generative AI systems. In perhaps the most consequential case yet, Anthropic settled out of court with authors for $1.5 billion. These writers claimed that the corporation trained its chatbot, Claude, on their infringing creations. These recent events highlight the rising tension around intellectual property rights in the realm of AI.
The SlimLM program, which Adobe describes as “optimized for document assistance tasks on mobile devices,” has been part of a broader suite of AI services launched by the company since 2023. Among these initiatives, Firefly, Adobe’s AI-powered media-generation suite, which most recently added prompt-based video editing and integration of third-party models.
The legal fight is by no means over. We’re encouraged that these lawsuits are already pushing Adobe to act more responsibly on its own platform and influence Adobe’s future AI endeavors. The implications go much further than Adobe. Taking the lead in this area, they could establish new industry best practices for how businesses should utilize datasets to train their AI systems.
“The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3),” – Reuters
“without consent and without credit or compensation.” – macobserver

