- Anna Archive scraped 86 million Spotify tracks and 256 million metadata rows, totaling 300TB
- The archive claims to preserve 99.6% of Spotify listens via open, bulk-distributed torrents
- Spotify confirmed the leak but said it lacked the full catalog of over 100 million tracks
Anna Archive, the world's largest shadow library, has claimed to have scraped up millions of tracks from Spotify and was planning to release them online. The activist group said it had backed up 86 million music files and 256 million rows of metadata, totalling around 300 terabytes of data from the music streaming platform.
"We backed up Spotify (metadata and music files). It's distributed in bulk torrents (approximately 300TB), grouped by popularity," Anna Archives said in a blog titled 'Backing up Spotify'.
"It's the world's first "preservation archive" for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6 per cent of listens," it added.
Anna Archive has been known for providing links to pirated books, but added that an opportunity presented itself and the group decided to scrape Spotify clean.
"Anna's Archive normally focuses on text (e.g. books and papers)....but our mission (preserving humanity's knowledge and culture) doesn't distinguish among media types. Sometimes an opportunity comes along outside of text. This is such a case."
Also Read | Woman Claps Back At Nosy 'Society Uncles' With Rs 62 Lakh Civil Suit, Wins Praise Online
Actively Investigating: Spotify
The Stockholm-based company, which has more than 700 million users worldwide, confirmed the leak but added that it did not contain its entire inventory of more than 100 million tracks.
"An investigation into unauthorised access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform's audio files. We are actively investigating the incident," Spotify was quoted as saying by Android Authority.
According to a report in The Guardian, the apparent leak could boost AI companies looking for material to develop their technology. Previously, Mark Zuckerberg's Meta is alleged to have used LibGen, a vast online archive of pirated books, to train its AI models.
As per US court filings, Zuckerberg approved the usage of pirated content, despite warnings from the AI team that it was a dataset that was known to be pirated.
Track Latest News Live on NDTV.com and get news updates from India and around the world