8,057 research outputs found

    Multiple imputation for sharing precise geographies in public use data

    Full text link
    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Mathematical techniques for the protection of patient's privacy in medical databases

    Get PDF
    In modern society, keeping the balance between privacy and public access to information is becoming a widespread problem more and more often. Valid data is crucial for many kinds of research, but the public good should not be achieved at the expense of individuals. While creating a central database of patients, the CSIOZ wishes to provide statistical information for selected institutions. However, there are some plans to extend the access by providing the statistics to researchers or even to citizens. This might pose a significant risk of disclosure of some private, sensitive information about individuals. This report proposes some methods to prevent data leaks. One category of suggestions is based on the idea of modifying statistics, so that they would maintain importance for statisticians and at the same time guarantee the protection of patient's privacy. Another group of proposed mechanisms, though sometimes difficult to implement, enables one to obtain precise statistics, while restricting such queries which might reveal sensitive information
    • …
    corecore