Allyn Robins, Senior Consultant
AI is a powerful tool, and like any powerful tool it can be used for both good and ill. This blog post is the second in a loose series that seeks to deliver an accessible-yet-measured take on the ways AI is being used, with the first part available on Newsroom here. This one was adapted from a seminar I presented for Privacy Week 2023 (recording linked here), which means it’s a little different in tone and structure. In the seminar, I outlined the four big impacts AI is having on privacy.
AI development is driving the capture of more data.
Most modern forms of AI rely on enormous datasets that allow them to be ‘trained’ to do various tasks. In general, the more data you have to train your AI (provided it’s the right sort of data), the better it’ll be at whatever it is you’re training it to do. This means that data is even more valuable today than when market analysts were dubbing it “the new oil”.
My co-presenter at Privacy Week, the distinguished Andrew Chen, focused almost exclusively on this issue, and it’s a rich one. In the future, I hope to write a post - or even a series of posts - delving into its many details. For now, I want to focus on an effect of this hunger for data that I haven’t seen discussed that much: Because AI is being trained to do a huge variety of tasks, that means that data is being collected about things - and in ways - that most people wouldn’t expect.
To see an example of this, look no further than Roombas - those adorable automatic vacuum cleaners. Most people who own one have no idea, but unless they’ve opted out, their Roomba has assembled a floor plan of their house and sent it back to headquarters for analysis. Their intent may not be nefarious - they plan to use this information to make their products more effective and efficient - but that data is sensitive, and it’s not hard to imagine an authoritarian government forcing them to hand it over. And even if that doesn’t happen, Roomba’s got a less-than-stellar record of protecting sensitive information.
And while most of the information collected these days is ‘anonymised’, that means less and less because…
AI is reducing the effectiveness of some privacy protections.
This form of anonymisation can very often be reversed, however: By carefully analysing the data that’s left, and cross-referencing it with other available information, it’s almost always possible to reverse-engineer your way back to a unique identity. As an incredibly powerful tool for collecting, analysing, and comparing data, AI makes this process even easier - and it’ll continue to make data anonymisation techniques even less effective as it grows in power.
But it’s not only big datasets we need to worry about - personal privacy provisions are being undermined too. Face blurring, for example: people have blurred or pixelated faces in photographs for decades to protect the privacy of their subjects, but new AI systems are getting better and better at reversing this process. The tech has already come a long way since it “deblurred” Barack Obama into a white guy, and it’s only going to get better.
This is a privacy time bomb that’s incredibly difficult to mitigate - while in future people can avoid using the techniques that AI is undermining, there’s a lot of ‘anonymous’ information already out there that will be rendered personally identifiable in the future.
Covering your face is hardly a perfect solution, because…
AI is augmenting our ability to gather information.
AI is creating new ways of collecting information, and enhancing old ways. A great example of AI enhancing old ways is facial recognition - applying facial recognition capabilities to a network of cameras takes them from a system that requires a lot of time and effort to track someone through, to a system that can track thousands of individuals in real time. The latter is clearly more of a threat to privacy. Covering your face isn’t fully effective either, because AI enables other ways of identifying and tracking individuals such as gait recognition.
AI also allows inferences about individuals to be made at a scale and with an efficacy that was previously unthinkable. You can find any number of people who’ll tell you that TikTok seemed to know they were gay before they did, for example. Now, TikTok wasn’t trying to identify queer users - it was just serving people ‘gay content’ because their algorithm had identified it as content they’d probably engage with - but if TikTok wanted to, it probably could.
This also enables methods of data collection that could - in the right light - be spun as ‘enhancing privacy’. An AI model could live on your phone, analysing your chat logs and camera roll, and report back only the inferences it makes. None of your personal information need ever leave the phone - it’ll just report back to central that you might be in the market for a new washing machine, for example. Whether that’s better or worse than current data collection practices is something we’ll all have to decide collectively.
And speaking of things we’re going to have to collectively decide how to deal with…
Synthetic media brings its own set of challenges.
Synthetic media, or generative AI, is very much at the peak of a hype cycle. Daily headlines trumpet its power and ‘disruptive potential’, and self-proclaimed ‘AI gurus’ are eager to expound on how (they think) it’ll change the world and everything in it. There are huge concerns arising from the hype surrounding synthetic media right now.
Countless companies are eager to get into generative AI, but to train a model of your own - or to fine-tune an existing one - takes a lot of data. As a result, many are simply imitating the pioneers in the area and ‘scraping’ whatever data they can, often not worrying about trifling concerns like ‘privacy’ and ‘copyright’. The market is moving so fast - and regulators so comparatively slowly - that the prevailing incentives are to simply try to get your model trained and released as quickly as possible. And once a model has been trained and put on the internet, it’s very hard to take down. All it takes is one torrent seed to keep something on the internet indefinitely.
The model-makers would likely tell you that you shouldn’t worry, because while their models may be trained on some sensitive data, training data isn’t stored in the model itself. This is technically true, but misleading - because as researchers are showing, it’s not too difficult to get generative AI models to spit out near-exact reproductions of some of their training data. This is concerning enough for images, but for text especially, private information can be unintentionally shared with anyone who uses the model.
And all the hype about what text-generation models especially are capable of (you can find no end of people who will tell you you can turn ChatGPT into your lawyer, your dating coach, or your therapist, but please do not do any of these things) is leading thousands of people to confide extensively in AI chatbots. But while treating ChatGPT as your therapist could sometimes be better than nothing, it’s not going to be nearly as effective as a real human - and it provides OpenAI, the company behind ChatGPT, with incredibly intimate information about you.
Now these are all issues caused by the novelty of generative AI in the market, and in time they are likely to lessen. But synthetic media also poses privacy challenges that will apply no matter how mature the market becomes: How do we protect privacy when AI tools allow convincing pictures, audio, and even (eventually) video of real people to be produced with minimal effort?
These can be used to enable ‘traditional’ invasions of privacy, things like scams and identity theft. But they can also be used for harassment, nonconsensual AI-generated pornography (one of the original use-cases for generative AI and still an appallingly common one, as well as an issue the New Zealand Parliament has declined to address multiple times), and the simple exploitation of everyday peoples’ voices or images. No longer do these tools require thousands of images or hundreds of hours of audio - a handful of pictures and five minutes’ audio is now enough. And in the social media age, that means that almost anyone is vulnerable.
In 2019, Brainbox published a report on synthetic media entitled ‘Perception Inception’, which argues that information about a person - whether it’s true or not, whether it’s been created by humans or generated by AI - should be legally considered personal information. That question has not been legally confirmed yet, in New Zealand or anywhere else (that we’re aware of), but we’re going to need to grapple with it before we can even really start dealing with the privacy implications of this technology.
Get in touch
At Brainbox we’re aiming to make sure that citizens, governments, and companies can engage effectively with the challenges posed by this and other emerging technologies. If you’d like to be kept up to date or have some work we can help with, you can get in touch or sign up to our contact list using the form in our website footer.