Jump to content

New Tools Developed for Protecting Privacy in Large Datasets


Guest_Jim_*

Recommended Posts

Big data is revolutionizing many areas of science, but before researchers can collect and use this information, they must first determine if and how they are going to protect the privacy of the individuals in the set. Before researchers would just remove names, but then it was shown to be possible to re-identify people by comparing information within an anonymized dataset with public information. Now researchers supported by the NSF have developed a new tool that should be able to protect a person's privacy without compromising the data.

This tool utilizing differential privacy, which was first described in the mid-2000s. Using this tool, identities are protected by adding noise to any queries of a dataset. If researchers requested information from a dataset, the answer would be approximately accurate: accurate enough for the study but not informative enough to identify anyone. There is enough randomization present in the data that one will not be able to distinguish between the real world and one in which an individual's data is not present in the dataset. If applied too simply, multiple queries could eventually reveal someone's identity, but by intelligently increasing the noise and correlating it across queries, this can be avoided.

Not all dataset could have this differential privacy system applied, because there are times you want specific information, like when searching for a matching organ donor. For those studies that do not require such specific information though, this could enable a number of currently private datasets to be publicly accessed.

Source: National Science Foundation



Back to original news post

Share this post


Link to post
Share on other sites

×
×
  • Create New...