Web Scraping vs. Terms of Use

Data scraping extracts and copies data from websites. Often this is done to train AI large language models. Because scraped data commonly includes user-generated content and personal information, many companies explicitly prohibit data scraping through their terms of use. This includes X (formerly Twitter) which tried to enforce its anti-scraping terms of use. X’s argument was rejected in X Corp. v. Bright Data Ltd., No. C 23-03698 WHA, 2024 WL 2113859, at *14 (N.D. Cal. May 9, 2024).

X sought to establish liability for Bright Data accessing X Corp.’s systems to scrape and sell data from the X website. X Corp. claimed that Bright Data was bound by “browsewrap” and “clickwrap” agreements that are entered into when accounts are created. Both agreements explicitly prohibited scraping of X Corp.’s data and, therefore, X Corp. sought damages for breach of contract, tortious interference with a contract, and unjust enrichment.

X lost its case. The Court explained that X users — not X Corp. — own and retain rights in content posted on the X website. This means the users own copyrights in their content and, therefore, the Copyright Act preempts X’s state law claims. Because the federal claims foreclose the X Corp.’s state law claims, X Corp.’s claims was dismissed for failure to state a claim.

The court cited X Corp.’s Terms of Service, which expressly state that X users, not X Corp., retain copyright ownership in all information, text, links, photos, and other materials and content they submit, post, or display on X. Thus, X users grant to X Corp. a non-exclusive, royalty-free license to use, copy, display, and distribute the content. In essence, X Corp. does very little other than making users’ copyrighted materials available to the public. To then exclude others from using and distributing X users’ content would be chutzpah (although this specific term was not used).

The court found that X Corp. pushed its luck even further for attempting to enact its own private copyright system. Federal copyright law dictates the extent to which public data may be freely copied from social media platforms –not private companies. Doctrines like “fair use” govern these issues, not whatever X Corp. permits or prohibits in its Terms of Use.

That said, the court clarified that not every state law interest would be automatically preempted. For example, the court advised analogous state law claims regarding social media users’ privacy should not be preempted by the Copyright Act since copyright law is unrelated to privacy issues.

In light of the court’s decision, data-scraping companies should conduct copyright analyses of the data they wish to scrape to determine whether they can be held liable. On the other hand, social media companies (and internet service providers) were reminded that the Copyright Act’s limitations on non-exclusive licensees must always be considered.

Terms of use will likely not be sufficient to protect against data scraping as long as the underlying data copied is either subject to fair use or not protectable under the Copyright Act. AI platforms must understand how to play within the rules of copyright law in addition to other state law claims such as breach of privacy and defamation.

David Seidman is the principal and founder of Seidman Law Group, LLC. He serves as outside general counsel for companies, which requires him to consider a diverse range of corporate, dispute resolution and avoidance, contract drafting and negotiation, and other issues.

He can be reached at david@seidmanlawgroup.com or 312-399-7390.

This blog post is not legal advice. Please consult an experienced attorney to assist with your legal issues.