Loading Now

Cloudflare accuses Aravind Srinivas-led Perplexity of covertly scraping data from sites; AI firm reacts — details here

Cloudflare accuses Aravind Srinivas-led Perplexity of covertly scraping data from sites; AI firm reacts — details here

Cloudflare accuses Aravind Srinivas-led Perplexity of covertly scraping data from sites; AI firm reacts — details here


AI startup Perplexity is allegedly crawling and scraping content from websites that have explicitly said that they don’t want to be scraped.

On Monday, Cloudflare, an internet infrastructure provider, published a research blog stating that it observed the AI startup, co-founded and led by CEO Aravind Srinivas, using deceptive methods to hide its crawling and scraping activities on those websites.

What are the accusations against Perplexity?

The network infrastructure giant said in the report that Perplexity initially crawls from its declared user agent, but when it’s presented with a network block, the AI obscures its crawling identity “in an attempt to circumvent the website’s preferences”.

AI products like those offered by Perplexity often rely on scraping large amounts of data from the internet. According to a Reuters report, multiple AI firms scrape text, images, and videos, bypassing the web standards set by the original publisher.

Cloudflare said that the situation came to light after its customers complained that Perplexity was still able to access their content, even after they added rules to their robots.txt file and specifically blocked Perplexity’s known bots.

After confirming that Perplexity’s crawlers were in fact blocked from those sites, Cloudflare performed tests to check and to confirm the AI startup’s ‘unauthorised’ behaviour.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” the Cloudflare’s post said.

Perplexity responds to accusations

The AI startup took to X (formerly Twitter) on Tuesday to refute the allegations. “The bluster around this issue reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud.”

Perplexity also explained the entire reasoning and process behind data scraping in another X post.

It claimed that their method of scraping data is 
“fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not.”

It further justified its actions by saying, “User-driven agents, by contrast, only fetch content when a real person requests something specific, and they use that content immediately to answer the user’s question. Perplexity’s user-driven agents do not store the information or train with it.”

The core message given by Perplexity is that user-driven AI agents act on behalf of users, not like bots and infrastructure providers like Cloudflare must understand and accommodate this distinction to preserve an open and accessible web.

Post Comment