New Tools Developed for Protecting Privacy in Large Datasets

Guest_Jim_* · October 9, 2015

Big data is revolutionizing many areas of science, but before researchers can collect and use this information, they must first determine if and how they are going to protect the privacy of the individuals in the set. Before researchers would just remove names, but then it was shown to be possible to re-identify people by comparing information within an anonymized dataset with public information. Now researchers supported by the NSF have developed a new tool that should be able to protect a person's privacy without compromising the data.

This tool utilizing differential privacy, which was first described in the mid-2000s. Using this tool, identities are protected by adding noise to any queries of a dataset. If researchers requested information from a dataset, the answer would be approximately accurate: accurate enough for the study but not informative enough to identify anyone. There is enough randomization present in the data that one will not be able to distinguish between the real world and one in which an individual's data is not present in the dataset. If applied too simply, multiple queries could eventually reveal someone's identity, but by intelligently increasing the noise and correlating it across queries, this can be avoided.

Not all dataset could have this differential privacy system applied, because there are times you want specific information, like when searching for a matching organ donor. For those studies that do not require such specific information though, this could enable a number of currently private datasets to be publicly accessed.

Source: National Science Foundation

Back to original news post

Sign In

New Tools Developed for Protecting Privacy in Large Datasets

Recommended Posts

Guest_Jim_*

Share this post

Link to post

Share on other sites