Identification through Anonymous Social Networking Data

Anonymity is “not sufficient for privacy when dealing with social networks” is the conclusion from a study that has successfully managed to de-anonymise large amounts of sanitised data from Twitter and Flickr.

The main lesson of this paper is that anonymity is not sufficient for privacy when dealing with social networks. […] Our experiments underestimate the extent of the privacy risks of anonymized social networks. The overlap between Twitter and Flickr membership at the time of our data collection was relatively small. […] As social networks grow larger and include a greater fraction of the population along with their relationships, the overlap increases. Therefore, we expect that our algorithm can achieve an even greater re-identification rate on larger networks.

There’s been some meritorious coverage of this study. This from BBC News:

The pair found that one third of those who are on both Flickr and Twitter can be identified from the completely anonymous Twitter graph. This is despite the fact that the overlap of members between the two services is thought to be about 15%.

This from Ars Technica:

It’s not just about Twitter, either. Twitter was a proof of concept, but the idea extends to any sort of social network: phone call records, healthcare records, academic sociological datasets, etc.

via Schneier