For a couple of months now, the debate out there has been revolving around the legality of web scraping. The question out there is, is web scraping legal? Is web scraping illegal? Well, for starters, web scraping, also known as crawling or spidering, is the process of gathering information or data from someone else’s website using some form of software, web scraping software to be precise. Even though some people still scrape websites manually.
The legality or illegality of such a process largely depends on the intention of the person gathering the information. In simple terms, the answer to the question; is not a straight forward answer, i.e., not a yes or no answer. Therefore, the real question should be regarding how you plan to use the data you’ve gathered from the website because, by the end of the day, the data on public websites is for general use anyway.
So, such information is legal to copy and store to a file on your computer. But then you should be very careful about how you plan to use such information. It is entirely ethical to use the data you scraped from the web for analysis purposes, but very unethical to use the same information as your own, say on your website, without acknowledging the owner, or getting the owner’s approval.
“the US circuit court of appeal upheld an injunction stating that it’s legal to scrape publicly available data from LinkedIn”
In fact, the US circuit court of appeal on September 9th, 2019, upheld the injunction won by hiQ against the Microsoft-owned social-media company, LinkedIn, stating that it’s legal to scrape publicly available data from LinkedIn. The ruling was made despite the social-media powerhouse insisting that web scraping LinkedIn violates user privacy. But according to the court of appeal, web scraping public sites does not violate the Computer Fraud and Abuse Act (CFAA).
LinkedIn had stepped in to try and block hiQ from harvesting user profiles from its sites. The San Francisco-based start-up is an analytic company web scraping personal details, especially on LinkedIn profiles, for analysis purposes. The analytic start-up uses the data to analyze workforce information, such as skills shortages or predicting when employees are likely to leave their jobs.
The decision by the court of appeal was historic in many ways, especially given that it touched on the data privacy and web scraping legal compliance regulations. At the same time, it seemed to suggest that web crawlers can easily obtain any data that is on public websites and is not copyrighted. The decision, however, barred hiQ or any other web crawlers’ explicit rights to use the same data for unlimited commercial purposes.
“The ruling stated categorically that the entry of a bot or a web scraping software in terms of legal compliance is not different from the entry of a browser”
In a broader sense, the decision not only legalized web scraping but also barred competitors from removing information from your site automatically if the site is public. The ruling stated categorically that the entry of a bot or a web scraping software in terms of legal compliance is not different from the entry of a browser. In both instances, you request publicly available data and do something with it on your side.
Now, as I mentioned earlier, web scraping does not include copyrighted data. For instance, a web scraper bot would be allowed to search YouTube for video titles, but you would not be allowed to re-post the same video on your site, simply because the videos are copyrighted. In essence, the ruling seems to protect copyright for data, including media files, regardless of how the data was obtained.
In the wake of the ruling, many site owners are desperately working around the clock to raise some technical hurdles to competitors who copy their data that is not copyrighted, such as ticket prices, product lots, open user profiles, and many more. They consider this publicly available information as ‘their own’, and therefore, web scraping this information, according to them, is ‘theft’. But according to the LinkedIn court ruling, it’s perfectly okay to scrap this information.
The court ruling further protected sites that require authentication from web scrapers or web crawlers. For instance, the decree prohibits a web crawler that logged-in to Facebook to download user data. According to the ruling, such action is illegal. The reasoning behind the decision is pretty much straightforward; users must agree to the site’s terms and conditions before logging-in to the site. Virtually, those terms of service typically prohibit actions like automated data collection.
Even though site owners may find it difficult to take any legal action against web scrapers, especially after the LinkedIn court ruling, technically, they can still limit web crawling. For instance, sites can implement techniques like ‘rate-throttling’ to limit the number of web pages that can be downloaded at the same time.
Another method that has come in handy nowadays to test whether a human or a web crawler is requesting a web page is the use of CAPTCHA technology. These techniques are used to prevent malicious bots that overload the website, causing it to crash. But it has also proven to be effective in controlling automated scraping, thus making it less cost-effective for the web crawling companies.
Existing web scraping legal compliance framework
many countries have legalized web scraping; however, they have imposed restrictions on copyrighted data.
- Computer Fraud and Abuse Act (CFAA):
this law basically was enacted to prevent computer hackers and other malicious actors from fetching data by gaining unauthorized access to a page.
- Chattel trespass:
chattel is the same as data law. In this act, a chattel is violated if the website server is affected by actions related to web scraping. In other words, trespass chattel is committed if the server slows or is down because of the scraping.
While the court of appeal decision seems to have settled a long-time question on the legality of web scraping, it remains to be seen whether LinkedIn will take it further to the Supreme Court or be contented with the decision. However, not all the decisions that are appealed in the highest court in the land are actually reviewed. But they nonetheless have a chance.