Dutch DPA issues guidelines on data scraping – Technologist

Data scraping is the automatic collection and storage of information from the Internet. According to the Dutch DPA, scraping almost always collects personal data. This collection creates privacy risks because scraping can collect personal data from a large number of people in a short period of time and about numerous aspects of a person’s life. The information may also include a variety of special personal data and criminal records, which usually should not be collected and used. In its guidance, the Dutch DPA states that data scraping by private companies and individuals will almost always be in violation of the General Data Protection Regulation (GDPR) for lacking a legal basis for processing personal data.

The Dutch DPA presents its views on GDPR principles, including ‘lawfulness’. This principle provides that a legal basis is required for data scraping. It is practically impossible to obtain consent from data subjects unknown to a controller. The Dutch DPA therefore focuses on the legal basis of legitimate interest (art. 6(1)(f) GDPR). It notes, however, that data scrapers only have a legitimate interest insofar as their interest is not commercial and scraping occurs in a targeted manner. 

The DPA provides three examples of data scraping which could potentially be lawful, including data scraping of:

  • public news websites, to map relevant news about your own company or working field;

  • webshops’ own websites , for example scraping of customer reviews, regarding the correspondence of own (potential) customers; and

  • online public fora about information security, to map security risks of for own company.

Critical Assessment of the DPA’s Views

The Dutch DPA states that commercial businesses (such as GenAI driven businesses) cannot base processing on a purely commercial interest. The Dutch DPA’s view has been criticized by the EC and has been rejected in Dutch Council of State, the highest Dutch general administrative court. It now remains a prejudicial question to be answered by the ECJ. With this pending proceeding before the ECJ and the EC’s opinion in mind, it is questionable whether this view is sound and will persevere when challenged in court.

The Dutch DPA also presents unpersuasive views on the responsibilities of data scrapers when processing special categories of personal data. Whilst the Dutch DPA notes that it remains unclear whether scraping can be considered to serve freedom of information on the same footing as search engines, the Dutch DPA notes that it will disregard this matter and issues these guidelines under the assumption that the ECJ will not equate search engines with data scrapers. Therefore, data scrapers have to determine whether an exception to processing special categories of personal data applies prior to collection. It is however questionable whether this view will hold up when challenged in court.

The guidelines furthermore consist of unattainable standards, in particular those regarding proposed adequate measures. The Dutch DPA recommends, inter alia, measures that contribute to transparency. Scrapers should provide information on the processing on their website, or on the websites scraped. This approach seems unlikely to work in practice as it will be difficult for scrapers to design their scraping processes in a way that allows them to know who to address in the privacy statement, which personal data is scraped, and from which website the personal data is scraped. It is therefore likely unfeasible for data scrapers to publish detailed information on the processing activity on websites prior to the data scraping activity in a way that meets the transparency requirements set out in the GDPR.

Lastly, the Dutch DPA presumes that “targeted” instruction of data scraping is possible. It is unclear if this limited and very targeted search could result in the collection of sufficient data to enable business to train their models.

What’s Next?

The view of the Dutch DPA is not surprising given what it has stated before. However, given that the EC and the Dutch Council of State have overruled the Dutch DPA in its legitimate interest interpretation, and the pending matter before the ECJ, we consider it premature to issue these guidelines. In this respect, we also note that the matter at hand involves inter alia copyright doctrines, AI governance and contractual obligations. Some principles laid down in these regulatory frameworks do not rely on obligations prior to the data scraping activity. Instead, they impose direct obligations on data scrapers such as transparency of their scraper bot results (post scraping activity). Since these principles are easier to adhere to, we expect to see guidance on the implication of these post scraping obligations that will influence data scraping governance.

 

Authored by Joke Bodewits.

Add a Comment

Your email address will not be published. Required fields are marked *