Measuring Re-identification Risk in Synthetic and Anonymized Data
presentation on novel and highly accurate methods for measuring re-identification risk
The ability to measure the risk of re-identification is fundamental to creating and managing non-identifiable data. The processing of this non-identifiable data does not need to meet many the obligations in contemporary privacy statutes.
Many of the re-identification risk measures that are used today overestimate the true risk. This is because they make simplifying assumptions or that the statistical approaches that they use are not accurate. Conservatism has resulted in two undesirable patterns: (a) non-identifiable datasets that are severely distorted with limited practical utility, and (b) to avoid the first consequence shortcuts are often taken (e.g., ignore many quasi-identifiers) which results in datasets that still have a high risk of re-identification. Of course, to escape these patterns some organizations simply do not use quantitative approaches at all and rely on qualitative or subjective assessments of re-identification risk, which is a third undesirable pattern because justifications for such approaches are becoming harder to make.
This underscores the importance of having good risk measurement. In this presentation we will discuss a new re-identification risk estimator. This estimator underlies all of our privacy models. In this webinar we will demonstrate that the accuracy of this estimator is quite high on real health datasets, and better than other estimators that are currently available and used. This new class of risk estimators can help move organizations away from the high exposure patterns described above.