25 research outputs found

    A Systematic Review of Re-Identification Attacks on Health Data

    Get PDF
    Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods

    Building an Anonymization Pipeline

    No full text

    The five safes of risk-based anonymization

    No full text
    The sharing of data for the purposes of data analysis and research can have many benefits. At the same time, concerns and controversies about data ownership and data privacy elicit significant debate. So how do we utilize data in a way that protects individual privacy but still ensures that the data are of sufficient granularity that the analytics will be useful and meaningful? Data anonymization (also called de-identification, depending on the jurisdiction) is the process of removing detail in the data or adding other controls to reduce re-identification risk. Good anonymization should mitigate exposure and allow you to easily demonstrate that you have taken your responsibility toward data subjects seriously

    B: A systematic review of re-identification attacks on health data. PLoS One 2011

    No full text
    Abstract Background: Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current deidentification methods

    Secure surveillance of antimicrobial resistant organism colonization or infection in Ontario long term care homes.

    No full text
    BACKGROUND: There is stigma attached to the identification of residents carrying antimicrobial resistant organisms (ARO) in long term care homes, yet there is a need to collect data about their prevalence for public health surveillance and intervention purposes. OBJECTIVE: We conducted a point prevalence study to assess ARO rates in long term care homes in Ontario using a secure data collection system. METHODS: All long term care homes in the province were asked to provide colonization or infection counts for methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant enterococci (VRE), and extended-spectrum beta-lactamase (ESBL) as recorded in their electronic medical records, and the number of current residents. Data was collected online during the October-November 2011 period using a Paillier cryptosystem that allows computation on encrypted data. RESULTS: A provably secure data collection system was implemented. Overall, 82% of the homes in the province responded. MRSA was the most frequent ARO identified at 3 cases per 100 residents, followed by ESBL at 0.83 per 100 residents, and VRE at 0.56 per 100 residents. The microbiological findings and their distribution were consistent with available provincial laboratory data reporting test results for AROs in hospitals. CONCLUSIONS: We describe an ARO point prevalence study which demonstrated the feasibility of collecting data from long term care homes securely across the province and providing strong privacy and confidentiality assurances, while obtaining high response rates
    corecore