Amazon Web Services Investigating Alleged Violation of Website Crawler Protocol by AI Company

Seattle, Washington – Amazon Web Services has launched an investigation into the activities of Perplexity AI to determine if the company is violating its rules. The cloud division of Amazon is looking into allegations that Perplexity AI is using a crawler hosted on its servers that ignores the Robots Exclusion Protocol, a web standard for controlling bot access to websites.

Reports from Wired indicate that a virtual machine hosted on an Amazon Web Services server was discovered bypassing robots.txt instructions on various websites. The machine, believed to be operated by Perplexity, allegedly scraped content from multiple publications, including The Guardian, Forbes, and The New York Times. Wired conducted tests confirming that Perplexity’s chatbot provided results closely paraphrasing articles with minimal attribution, leading to concerns about content scraping.

Despite allegations against Perplexity, they denied violating the Robots Exclusion Protocol, stating that their PerplexityBot respects robots.txt instructions. Amazon Web Services emphasized that their terms of service prohibit abusive and illegal activities, holding customers accountable for compliance. The company is actively investigating the reports of potential violations, including the use of bots to gather content for training large language models.

Perplexity spokesperson Sara Platnick insisted that their crawlers are in compliance with Amazon’s terms of service and denied any wrongdoing. However, Wired’s investigation revealed that PerplexityBot may ignore robots.txt in certain scenarios, which raised further questions about the company’s practices. CEO Aravind Srinivas of Perplexity acknowledged the use of third-party web crawlers, including the bot identified by Wired, but denied intentionally disregarding the Robots Exclusion Protocol.

As the investigation unfolds, both Amazon Web Services and Perplexity continue to provide statements to address the allegations and clarify their positions. The issue highlights the challenges of monitoring and enforcing web standards in the evolving landscape of AI and data scraping. Stay tuned for updates on this developing story as more information becomes available.