World’s Most Wired
Arvind Narayanan’s business card is an exercise in brevity. It contains no data except his name and the words “Google me,” a fitting calling card for an academic who specializes in privacy and anonymity research. When you do Google him, his online footprint is robust, but highly selective and pruned. There’s a website for his post-doctoral research at Stanford University, where he’s currently based, an online journal of semi-personal musings (like the time he fell asleep jet-lagged and awoke with complete amnesia about, not just who he was, but what he was – animal, vegetable, mineral?), a Google scholar page indicating his work has been cited 849 times, and news articles about high-profile projects he’s worked on. There are also various social networking accounts (Facebook, Google+) that paint a picture of a precise and scientifically calculating, but whimsical, personality – one whose music tastes run the gamut from Queen to Qawwali (Sufi devotional music), and who prefers mind-bending films like Memento and Inception to mind-numbing superhero flicks.
What you won’t find online about Narayanan are party snapshots of him caught in a drunken stupor or inadvisable tweets later deleted on second thought. There’s little about him on the web that he doesn’t specifically want there, and he’s careful to use browser tools to control the digital trail his online activities leave behind. But as a data scientist, Narayanan knows there’s a lot he can’t control — his own work shows that often the steps he and others take to protect themselves online can be easily undone.
Narayanan isn’t much known outside the insular world of data privacy, but he’s likely to be a name that you’ll be seeing more and more, particularly as he’ll be heading to Princeton University next year to join the well-regarded Center for Information Technology Policy, led by computer scientist Ed Felten. In the age of Big Data, where bulk supplies of information about your browsing and other online activities are bought and sold instantaneously in marketplaces each day, and where Target can know your teenage daughter is pregnant before you do, Narayanan is one of the leading hands-on thinkers in exploring how traditional notions of privacy are radically fractured by the collision of big data and cheap analytics. Take, for example, his now-famous Netflix study.
In 2006, Narayanan and a colleague dug into “anonymized” Netflix customer information and showed how little data collection it took to unmask an anonymized person’s identity. Netflix, as part of a public contest to devise a better movie-recommendation algorithm, released a data set of 100 million movie ratings made by 480,000 of its customers. The online DVD provider anonymized the data before releasing it to contestants, by replacing names with random unique identifying numbers to protect the privacy of its customers. But Narayanan and Vitaly Shmatikov were able to unmask some Netflix users simply by taking the anonymized movie ratings – along with timestamps showing when customers submitted them – and comparing them against non-anonymized movie ratings posted at the Internet Movie Database web site. “Even before we looked at the data, we knew right away that this issue was going to exist,” Narayanan says. The research led to a privacy lawsuit against Netflix and a 2010 settlement that scuttled the company’s plans for a second contest that would have involved using even more customer data. Since that study and research paper, he and colleagues have produced four other major ones proving similar points in different contexts.
“In almost every one of the data anonymization projects that I’ve done, there were at least some people who looked at that before we did it and said, ‘Huh, I don’t think that’s possible.’ So that’s really what gets me going.”
Earlier this year, he and colleagues at Stanford and the University of California at Berkeley published a study about an algorithm designed to unmask “anonymous” internet authors simply by analyzing their word choice and writing styles and comparing these against online content written by writers who published under named bylines. Prior research had looked at making the same connections among a few hundred people, but Narayanan’s study scaled that out to make matches among some 100,000 authors. Now he’s working on a potentially new landmark study around DNA and de-anonymization. But he’s reluctant to discuss it before it’s peer-reviewed.
“This project is less about what’s happening here and now but kind of about what the world is going to look like probably ten years from now,” he notes. Pressed for more details, he dances carefully around the question. “[With DNA] it’s just a completely new domain of data with new characteristics,” he says. “The data here is very unique, and the connections between people are unique. And the particular de-anonymization threat that I’m considering . . . is very different from any of my past projects. . . . In terms of named verses anonymous samples, think about just pieces of hair at a train station. Is that named, or anonymous? That’s kind of what I mean by considering a different threat model.” In general, he says by way of elaborating, his work has not been about looking at how to distinguish between 1 out of 5 people with very high accuracy but, rather, about looking at distinguishing between 100,000 people — possibly with much less accuracy. “At least it could serve as a first step for an adversary, or some party, to further narrow down the list of possibilities, and then use some other technique to identify the individual,” he says.
For a guy so focused on privacy and anonymity, Narayanan has a strange hobby that at first glance might appear to focus on violating privacy. It involves photographing other people’s license plates. He says he has a collection of about 500 of them and snaps the pics in parking lots only when no one is around or in the car. “There are so many interesting vanity plates, especially in Palo Alto,” he says, mentioning the wealthy town where Stanford University resides. His interest in plates began when he was a shy child and “sort of socially maladjusted.” The hustle and bustle of the world gave him cognitive overload, he says, and the letters and numbers on license plates helped him focus by looking for patterns in them. His interest in plates continues to this day, though it sometimes makes for awkward conversation with women he’s dating when he has to explain that he knows their plates out of a habit of memorizing them, not because he’s interested in stalking them. He realized how bizarre his habit appeared to others when he was driving with a friend one day and noted the plate on the car in front of them. “Dude!,” he told his friend. “That car has every letter and number either equal to the ones on your plate, or one letter off.” His friend turned to him and they looked silently at one another for several seconds as the words were absorbed. “And the expression on his face – there’s only one way to describe it, which was WTF?” Narayanan recalls. Then they both burst out laughing. “In Silicon Valley . . . you might know a person really well, but everyone has this hidden weirdness; I guess he was thinking, ‘So that’s yours,’” Narayanan says. That focus on data and patterns might seem weird to others, but it has served him well in his research.
- We apply our de-anonymization methodology to the Netﬂix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netﬂix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netﬂix records of known users, uncovering their apparent political preferences and other potentially sensitive information.
- View the Study
arayanan chooses his projects based on what he feels he can bring to the research, and whether or not it will produce something valuable to the public or to policy makers. “But the more important criteria for me is that it’s technologically novel, and it’s something that people could not have realized before, without actually doing the work,” he says. “In almost every one of the data anonymization projects that I’ve done, there were at least some people who looked at that before we did it and said, ‘Huh, I don’t think that’s possible.’ So that’s really what gets me going.”
The Netflix study began with a simple question, asking what would happen to customer data when companies anonymized it in good faith, but then passed it to third parties who might combine it with additional data? Could the mere marriage of datasets undo the anonymity of customers? It’s a problem that isn’t limited to Netflix or even the online world. Many companies involved in the collection of customer data or online behavioral tracking insist that it’s okay to collect and share data about customers as long as the data is anonymous at the time it’s collected or post-collection. But Narayanan thinks that’s naive at best and disingenuous at worst.
7 Favorite Movies
Adaptation – The deeply self-referential elements in this movie left me in awe.
Memento – No film makes you viscerally appreciate the fact that we are our memories better than this one.
Eternal Sunshine of the Spotless Mind – I think it’s possible that in a few decades, technology will force us to think about the questions this movie raises.
Inception – I take the possibility reasonably seriously that we have cognitive abilities in our dreams that we don’t when awake, so this movie appealed to me at … ahem … more than one level.
Usual Suspects – In terms of twist endings and unreliable narration, this one is unsurpassed.
Fight Club – Another unreliable-narrator film. What’s not to like? A twist ending, artistic violence, deep themes and phenomenal acting.
The Prestige – This movie shook me. The idea of being devoted to one’s art more than one’s life is fascinating and terrifying at the same time.
World’s Most Wired
Wired is putting a spotlight on the brightest geniuses you’ve never heard of — the entrepreneurs, scientists, artists and designers who are quietly shaping the future behind the scenes. They’re the World’s Most Wired, and we’ll be profiling one of them bi-weekly through the end of the year. Check it out here.