Allegations Mount Against Perplexity for Unauthorized Web Scraping

Kevin Lee Avatar

By

Allegations Mount Against Perplexity for Unauthorized Web Scraping

Perplexity, a completely AI-driven search engine, has been accused of stealing work from major print media companies, including Wired. The company was accused of breaking federal laws by scraping the web without permission. They went around clearly established explicit blocks placed by different web pages. At the Disrupt 2024 conference, questions surrounding Perplexity’s ethics emerged when CEO Aravind Srinivas struggled to define plagiarism during an interview with TechCrunch’s Devin Coldewey.

The issue grew more heated when Cloudflare published research that suggested that Perplexity was using sneaky, undisclosed agents to circumvent websites’ no-crawl instructions. This practice allowed the firm to vacuum up data from hundreds of thousands of domains. So, naturally, they cascaded millions of requests a day. As per Cloudflare, this was part of the activity that was intentionally generated to bypass blocks placed by website owners.

Cloudflare’s findings highlighted that Perplexity not only utilized its declared user-agent but a generic browser that mimicked Google Chrome when its designated crawler faced barriers. “We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” the company stated in its report.

Despite these allegations, Perplexity’s spokesperson, Jesse Dwyer, dismissed Cloudflare’s blog post as merely a “sales pitch.” He maintained that the evidence we shared in the post showed that “no content was loaded.” Based on Cloudflare’s own research, this is not the case, and in fact reflects tremendous scraping activity.

Still, this would not be the first time Cloudflare has stepped up to fight AI scraping. In the past year, the company launched a free tool aimed at mitigating the impact of AI bots on websites. Just last month, that new marketplace went live. This marketplace allows online content owners and publishers to identify and charge AI scrapers that crawl their sites. This action is a tremendous victory for creators and a big step forward in addressing harmful, unauthorized content extraction practices.

Meanwhile, at the Disrupt 2024 conference in San Francisco last week, Devin Coldewey reported on Perplexity’s AI plagiarism and unethical web scraping as shown in this video. Perplexity come under heavy fire on the Internet. This is a sign of the tech community’s increasing awareness of the ethical limits of AI use cases and their impact on creators of original content.

In a recent blog post by Cloudflare’s researchers, they pointed out that this bot activity in Perplexity was massive. “This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” they explained.

Kevin Lee Avatar
KEEP READING
  • Tesla Faces Brand Trust Crisis Amid Battery Market Boom

  • Australia Reiterates Commitment to Two-State Solution in Call with Palestinian Authority

  • Melbourne Demons Part Ways with Premiership Coach Simon Goodwin

  • Designer Banned from Dribbble Launches New Platform After Policy Changes

  • Japan Secures $10 Billion Frigate Contract with Australia Amidst Intense Competition

  • Understanding the Complexities of Adolescent Abortion Rights in Australia