Advertisement

Why Reddit Filed Lawsuit Against Perplexity And Data Scrapers

In 2023, Reddit asked companies such as OpenAI to pay for access to its data, but some companies, instead of paying, scraped Reddit content from Google search results.

Why Reddit Filed Lawsuit Against Perplexity And Data Scrapers

The day and age is all about data, and two major platforms are at loggerheads over who gets to use it and how. Social media forum Reddit has filed a lawsuit against four companies - Perplexity, SerpApi, Oxylabs, and AWMProxy - accusing them of illegally scraping its content without permission.

Reddit, used by 416 million weekly users, hosts discussions on everything from makeup and dog breeds to video games and travel. At the centre of its lawsuit is Perplexity, and the case in the US District Court for the Southern District of New York comes two years after the forum asked companies such as OpenAI to pay for access to its data. They, instead of paying, scraped Reddit content from Google search results.

All four companies have defended themselves in court and dismissed the allegations. Perplexity claimed that its approach was responsible and principled, while Denas Grybauskas of Oxylabs argued that no company should claim ownership over public data, as the information available publicly should remain free to use.

Reddit copyright lawsuit vs Perplexity

To strengthen its case, Reddit created a "test post" for San Francisco-based AI search engine, Perplexity, that was visible only via Google search. Within hours, Perplexity's search results surfaced, suggesting that the company actively collected Reddit data even after receiving a cease-and-desist order.

Reddit's "test post" wins praise

Reddit's "test post" was highly praised by major tech figures. Ed Newton-Rex, CEO of Fairly Trained and composer, wrote on social media platform X, "Absolutely brilliant detail from the new Reddit AI copyright lawsuit vs. Perplexity."

AI engineer Rohan Paul also wrote, "The core point is whether harvesting through Google results and reseller feeds still counts as circumvention of Reddit's protections and terms rather than fair public indexing, which is the line this case will decide."

Scraping isn't new

In the early days of the internet, Google built its search engine by scraping web pages. It collected information from several sites, organised it, and then showed it to users in search results.

Back then, scraping wasn't a big problem because there was a monetisation system for everyone. Websites got traffic, scrapers sold data, and Google organised content - all parties benefited. 

Jose Castaneda, a Google spokesman, in a statement said, "Google has always actively respected the choices websites make through robots.txt, but sadly, there's a bunch of stealthy scrapers that do not."

Over time, various companies started scraping Google search results to collect information through various categories and sold their findings to businesses looking to appear higher in Google search results. This process benefited both Google and website owners as it helped them drive traffic to websites by indexing them.

Doug Leeds, a co-founder of Really Simple Licensing, a nonprofit that works to help publishers and creators obtain compensation when AI uses their work, said, "It was all the original ecosystem of the web. It wasn't necessarily a problem back then, because there was a monetization method for all the companies involved."

Why And What Reddit Is Fighting

Today, however, AI companies are scraping data secretly at a massive scale to train their chatbots without compensating the creators. Reddit has restricted access to its site to prevent AI companies from freely using its content.

Just as The New York Times and Simon & Schuster sold licenses to their content to AI companies for millions of dollars, Reddit wants to do the same.

Earlier in June, Reddit sued AI company Anthropic for unlawfully using its data. 

Track Latest News Live on NDTV.com and get news updates from India and around the world

Follow us:
Listen to the latest songs, only on JioSaavn.com