16 research outputs found
A Game Theoretic Framework for Analyzing Re-Identification Risk
<div><p>Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.</p></div
A comparison of four de-identification policies for the case study on performance measures.
<p>SH: Safe Harbor. GI: Generalization Intensity.</p><p>A comparison of four de-identification policies for the case study on performance measures.</p
Payoff across strategies.
<p>Payoffs for the record ⟨48, Asian, Female, 38363⟩ across all strategies.</p
Scatter-plot of Payoff Differences.
<p>Detailed distributions of the publisher’s payoff differences (left) and the adversary’s payoff differences (right) between games and HIPAA Safe Harbor (SH).</p
DGH for Age.
<p>The Domain Generation Hierarchy (DGH) for the attribute Age in the case study.</p
DGH for Race.
<p>The Domain Generation Hierarchy (DGH) for the attribute Race in the case study.</p
Data used to fit the LME model after triage
This dataset contains the data that is used in the statistical analysis of the paper. This dataset is obtained from performing triage as described in the paper to the dataset with title "Publication and dbGaP datasets mapping before triage"
Recent notable HIPAA breach violation cases as reported by the U.S. Department of Health and Human Services.
<p>Recent notable HIPAA breach violation cases as reported by the U.S. Department of Health and Human Services.</p
Histogram of Payoff Differences.
<p>Distributions of the publisher’s payoff differences (left) and the adversary’s payoff differences (right) between games and HIPAA Safe Harbor (SH).</p
A performance comparison of the de-identification game solving approaches.
<p>BIS: Backward Induction Search. LBS: Lattice-Based Search. Payoff difference means the absolute difference of payoff for one record between a heuristic-driven approach and the baseline BIS approach.</p><p>A performance comparison of the de-identification game solving approaches.</p