Algorithms claim to hunt specific targets
Computer scientists at the University of Pennsylvania have developed an algorithmic framework for conducting targeted surveillance of individuals within social networks while protecting the privacy of “untargeted” digital bystanders. As they explain in this week’s Proceedings of the National Academy of Sciences (PNAS), the tools could facilitate counterterrorism efforts and infectious disease tracking while being “probably privacy-preserving”.
“The tension between the useful or essential gathering and analysis of data about citizens and the privacy rights of those citizens is at an historical peak,” the researchers begin. “Perhaps the most striking and controversial recent example is the revelation that US intelligence agencies systemically engage in ‘bulk collection’ of civilian ‘metadata’ detailing telephonic and other types of communication and activities, with the alleged purpose of monitoring and thwarting terrorist activity.”
Other conflicts mentioned by the Penn group include issues around medical data and targeted advertising. In every case, the friction is between individual privacy and some larger purpose, whether it’s corporate profits, public health, or domestic security.
“There is a protected subpopulation that enjoys (either by law, policy, or choice) certain privacy guarantees,” the researchers write. “For instance, in the examples above, these protected individuals might be nonterrorists, or uninfected citizens (and perhaps informants and health care professionals). They are to be contrasted with the ‘unprotected’ or targeted subpopulation, which does not share those privacy assurances.”
“At the highest level,” the group writes, “one can think of our algorithms as outputting a list of confirmed targeted individuals discovered in the network, for whom any subsequent action (e.g., publication in a most-wanted list, further surveillance, or arrest in the case of terrorism; medical treatment or quarantine in the case of epidemics) will not compromise the privacy of the protected.”
The algorithms are based on a few basic ideas. The first is that every member of a network (a graph) comes with a sequence of bits indicating their membership in a targeted group. If say, the number two bit was set in your personal privacy register, then you might be part of the “terrorist” target population. For an algorithm searching a network for targets, it doesn’t just get to ask to reveal every network member’s bits. It has a budget of sorts, where it can only reveal so many bits and no more. The algorithms work to optimise this scenario such that as many bits-of-interest are revealed as possible.
It does this optimisation via a notion known as a statistic of proximity (SOP), which is a quantification of how close a given graph node is to a targeted group of nodes. This is what guides the search algorithms.
Using real social networks with stochastically (randomly) generated, artificial target groups, the Penn team found that they could indeed search a network for targeted members while not revealing information about individuals in privacy-protected populations.
“Our work is of course not a complete solution to the practical problem, which can differ from our simple model in many ways,” the group concludes. “Here we highlight just one interesting modeling question for future work: Is it possible to give rigorous privacy guarantees to members of the protected population when membership in the targeted population is defined as a function of the individuals’ private data? In our model, we avoid this question by endowing the algorithm with a costly ‘investigation’ operation, which we assume can infallibly determine an individual’s targeted status—but it would be interesting to extend our style of analysis to situations in which this kind of investigation is not available.”