16 research outputs found

    A Game Theoretic Framework for Analyzing Re-Identification Risk

    No full text
    <div><p>Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.</p></div

    A comparison of four de-identification policies for the case study on performance measures.

    No full text
    <p>SH: Safe Harbor. GI: Generalization Intensity.</p><p>A comparison of four de-identification policies for the case study on performance measures.</p

    Scatter-plot of Payoff Differences.

    No full text
    <p>Detailed distributions of the publisher’s payoff differences (left) and the adversary’s payoff differences (right) between games and HIPAA Safe Harbor (SH).</p

    Data used to fit the LME model after triage

    No full text
    This dataset contains the data that is used in the statistical analysis of the paper. This dataset is obtained from performing triage as described in the paper to the dataset with title "Publication and dbGaP datasets mapping before triage"

    Histogram of Payoff Differences.

    No full text
    <p>Distributions of the publisher’s payoff differences (left) and the adversary’s payoff differences (right) between games and HIPAA Safe Harbor (SH).</p

    A performance comparison of the de-identification game solving approaches.

    No full text
    <p>BIS: Backward Induction Search. LBS: Lattice-Based Search. Payoff difference means the absolute difference of payoff for one record between a heuristic-driven approach and the baseline BIS approach.</p><p>A performance comparison of the de-identification game solving approaches.</p
    corecore