
Large public datasets scraped from the internet are the fuel powering the latest burst of progress in the AI sector. This precedent matters far beyond the narrow question of facial recognition. The dataset is governed – or, not – by Chinese law, but the faces in it certainly aren’t exclusively Chinese citizens. Take the Chinese dataset released as WebFace260M, built for training facial recognition AI using faces and names scraped from IMDb and Google Images. The company isn’t the only one to raise such questions. The ICO’s position is that any company handling the data of UK citizens is bound by UK law Clearview disagrees. Clearview AI is not subject to the ICO’s jurisdiction, and Clearview AI does no business in the UK at this time.” “While we appreciate the ICO’s desire to reduce their monetary penalty on Clearview AI, we nevertheless stand by our position that the decision to impose any fine is incorrect as a matter of law.

A spokesperson for its lawyers suggested that the company simply will not comply. Clearview’s response has been vituperative. Unfortunately, that might not help that much. You have rights over the use of your data that remain relevant even if it gets scraped, passed around, reformed and deanonymised. That’s not the case, thankfully, in the UK – as the ICO’s fine demonstrates.

But, other than that, the company continues to operate more or less unchallenged in the US. A few state regulators have taken action against Clearview, and Illinois, which has the strongest biometric privacy law in the country, secured a ban this month on the company working with the private sector in the state (it had already voluntarily ceased such deals in 2020, Clearview says). But their ability to do so is limited: scraping data from public sites is legal under US law, and American data protection regulations are slim, and mostly bound up in contract law – by which Clearview is unencumbered, because it didn’t make any agreements with the people whose data it processed. In the furore following the NYT’s publication, Facebook, YouTube, Twitter and LinkedIn all demanded the company stop collecting images from their sites. Instead, it took the gamble that even if it did get caught, it was unlikely to be forced to delete data it had already collected. It would have been impossible to get that information with Facebook’s permission, even pre-Cambridge Analytica, and so Clearview just didn’t bother. The system - whose backbone is a database of more than three billion images that Clearview claims to have scraped from Facebook, YouTube, Venmo and millions of other websites - goes far beyond anything ever constructed by the United States government or Silicon Valley giants. As the New York Times’ Kashmir Hill wrote: And Clearview gathered that in the most brazen manner possible: it just took it all from social media. The impressive part of Clearview’s work was, instead, in building up the database required to make the system functional.Īny facial recognition system like that needs at its heart a massive collection of people’s faces, linked with their names. Similar tech was, of course, already built in to Facebook, and Russian company FindFace offered a similar service domestically since 2016. Clearview’s product was simple: facial recognition technology, marketed to law enforcement in particular, that could take a picture of a suspect and return a solid guess as to their name.Īs technology, it wasn’t actually hugely novel. The company burst into public awareness in early 2020, after a New York Times exposé described it as “the secretive company that might end privacy as we know it”.
