Faced with evidence that the Dutch police have been using anonymised trip data from Tom Tom users to assist in enforcing speeding laws, Tom Tom CEO Harold Goddijn last week published an official comment on YouTube.
In the video, Goddijn said:
We learned today...that the police in the Netherlands are using [our] information to identify road stretches where people in general, and on average, are driving too fast. They use [our data] to put up speed cameras and speed traps. And we don't like that, because our customers don't like it. We will prevent that type of usage of our data in the future.
Tom Tom seems to be recognising some potential privacy-eroding issues which other companies don’t or haven’t concerned themselves with in the past. (Not all viewers of the YouTube video agree with me – there are currently 34 dislikes but only 26 likes.)
Even so-called anonymous data, collected in good faith, may end up being anything but.
Possibly the most infamous, and outrageous, anonymity gaffe in recent history was perpetrated by AOL nearly five years ago. The company published some 20 million search terms – supposedly for web research purposes – with usernames replaced with arbitrary numbers.
The problem was that each username was replaced with the same number every time it appeared. The result ought to have been foreseen.
As you accumulate more and more search terms tied to specific individuals, you can make ever-more accurate deductions about their identities from the search terms alone.
After all, over months of searching, you probably give away multiple hints about your identity. You might narrow down where you live by repeatedly searching for businesses in your neighbourhood. You might search for cohorts from your school or college. You might check garbage collection dates in your street. You might even do a vanity search for your own name or property, which, in the AOL data, would have been the privacy-erosion equivalent of “Bingo!”
Indeed, the New York Times famously traced Thelma Arnold, and her dog Dudley, right to her home in Georgia by reversing the AOL search data to remove her anonymity altogether.
Google, too, is no stranger to controversy over its definition of anonymised. Google is proud of the fact that it “anonymises” IP addresses in its search logs after nine months, even though this involves simply blanking out the bottom eight bits of your IP address.
This just about sneaks into the definition of anonymise given in my New Oxford American Dictionary, namely: to “remove identifying particulars from test results for statistical or other purposes”. But it might not meet your definition. You probably assume that an anonymised log entry can’t be connected with you at all.
Keeping the actual details of every search term – even ones which actually include your name, or your address, or some sort of personally identifiable information – isn’t really anonymous. Tying these searches together with an IP identifier which narrows you down to 1 in 256 people (at the very best – many /24 networks are only sparsely populated, after all), and which probably identifies your ISP, your suburb and your phone exchange, is even worse.
So, be careful out there. Anonymised data may not be as anonymous as you thought. And anonymised data which you share with a vendor – such as your average speed across the Sydney Harbour Bridge, where you’re supposed to keep below 70km/hr – might end up getting used for purposes you wouldn’t consider “anonymous”.
Unless you are absolutely certain what will be shared, and how, and for what purpose, I recommend that you turn such sharing features off. And if a product or service requires data sharing to work at all, don’t buy into it in the first place.
At the very least, before enabling any “share data with vendor” option, ask yourself, and the vendor, what’s in it for you – in other words, work out the best result you can ever expect from the sharing. Contrast that value with what’s in it for the vendor, or for the intelligence services and law enforcement authorities in that vendor’s jurisdiction.
Make sure there is an obvious positive balance in your favour.
If there isn’t, then the vendor simply isn’t paying you enough for your data. It really is a commercial transaction!