278,487 research outputs found

    Noise Infusion as a Confidentiality Protection Measure for Graph-Based Statistics

    Get PDF
    We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightforward extension of the dynamic noise-infusion method used in the U.S. Census Bureau’s Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs

    Distribution-Preserving Statistical Disclosure Limitation

    Get PDF
    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.statistical disclosure limitation; confidentiality; privacy; multiple imputation; partially synthetic data

    Confidentiality considerations for use of social-spatial data on the social determinants of health: Sexual and reproductive health case study

    Get PDF
    Understanding whether and how the places where people live, work, and play are associated with health behaviors and health is essential to understanding the social determinants of health. However, social-spatial data which link a person and their attributes to a geographic location (e.g., home address) create potential confidentiality risks. Despite the growing body of literature describing approaches to protect individual confidentiality when utilizing social-spatial data, peer-reviewed manuscripts displaying identifiable individual point data or quasi-identifiers (attributes associated with the individual or disease that narrow identification) in maps persist, suggesting that knowledge has not been effectively translated into public health research practices. Using sexual and reproductive health as a case study, we explore the extent to which maps appearing in recent peer-reviewed publications risk participant confidentiality. Our scoping review of sexual and reproductive health literature published and indexed in PubMed between January 1, 2013 and September 1, 2015 identified 45 manuscripts displaying participant data in maps as points or small-population geographic units, spanning 26 journals and representing studies conducted in 20 countries. Notably, 56% (13/23) of publications presenting point data on maps either did not describe approaches used to mask data or masked data inadequately. Furthermore, 18% (4/22) of publications displaying data using small-population geographic units included at least two quasi-identifiers. These findings highlight the need for heightened education for researchers, reviewers, and editorial teams. We aim to provide readers with a primer on key confidentiality considerations when utilizing linked social-spatial data for visualizing results. Given the widespread availability of place-based data and the ease of creating maps, it is critically important to raise awareness on when social-spatial data constitute protected health information, best practices for masking geographic identifiers, and methods of balancing disclosure risk and scientific utility. We conclude with recommendations to support the preservation of confidentiality when disseminating results

    Exploring Confidentiality Issues in Hyperledger Fabric Business Applications

    Get PDF
    The rise of Bitcoin and cryptocurrencies over the last decade have made its underlying technology (blockchain) come into the spotlight. Blockchain is a secure ledger of linked records called blocks. These records are cryptographically immutable and any tampering with the block is evident through a change in the cryptographic signature of the block. Among the blockchains deployed in practice today, Hyperledger Fabric is a platform that allows businesses to make use of blockchains in their applications. However, confidentiality issues arise with respects to the blocks in this blockchain network due to the fact that blocks might contain sensitive information accessible to all peers with a copy of the blockchain. In this work, we aim to address the confidentiality issue present in current Hyperledger Fabric. Our current approach consists of leveraging cryptographic techniques to ensure the confidentiality of the shared data in the blockchain along with crafted access control policies so that only authorized peers can access the otherwise concealed data. This becomes a crucial requirement especially with business models that require their transaction information to be concealed. Recent results show that the use encryption along with interesting access control policies allow obfuscation of data for desired outside entities, although more work is required

    Sustaining Engineering Education Research: Sharing Qualitative Research Data For Secondary Analysis

    Get PDF
    The need for secondary data analysis practices emerges from multiple sources. Qualitative researchers often have rich data sets that far exceed the time available for data analysis, and many of us wish that someone could spend more time with the data. We also recognize that local data sets would benefit from further analysis that linked our data with related data collected in different contexts. Many also grapple with increasing data sharing requirements from funding agencies that raise concerns about participant confidentiality and data integrity. This workshop provides a chance to explore potential responses to these concerns through a robust dialogue around secondary data analysis practices and pitfalls

    Public-use linked mortality file

    Get PDF
    Updated March 2020The National Center for Health Statistics (NCHS) has linked data collected from several NCHS population surveys with death certificate records from the National Death Index (NDI). Due to requirements to protect the confidentiality of the NCHS survey participants, restricted-use versions of the linked mortality files are made available only through the NCHS Research Data Center (RDC).To complement the restricted-use files and increase data access, NCHS developed public-use versions of the linked mortality files for the 1986-2014 National Health Interview Survey (NHIS), 1999-2014 National Health and Nutrition Examination Survey (NHANES) and NHANES III. The public-use linked mortality files include a limited set of variables for adult participants only. The public-use versions of the NCHS linked mortality files were subjected to data perturbation techniques to reduce the participant disclosure risk. Synthetic data were substituted for follow-up time and underlying cause of death for select records. Information regarding vital status was not perturbed. The public-use linked mortality file provides mortality follow-up data from the date of survey participation through December 31, 2015.public-use-2015-linked-mortality-file-description.pdf20201132

    Alcohol, assault and licensed premises in inner-city areas

    Get PDF
    This report contains eight linked feasibility studies conducted in Cairns during 2010. These exploratory studies examine the complex challenges of compiling and sharing information about incidents of person-to-person violence in a late night entertainment precinct (LNEP). The challenges were methodological as well as logistical and ethical. The studies look at how information can be usefully shared, while preserving the confidentiality of those involved. They also examine how information can be compiled from routinely collected sources with little or no additional resources, and then shared by the agencies that are providing and using the information.Although the studies are linked, they are also stand-alone and so can be published in peer-reviewed literature. Some have already been published, or are ‘in press’ or have been submitted for review. Others require the NDLERF board’s permission to be published as they include data related more directly to policing, or they include information provided by police.The studies are incorporated into the document under section headings. In each section, they are introduced and then presented in their final draft form. The final published form of each paper, however, is likely to be different from the draft because of journal and reviewer requirements. The content, results and implications of each study are discussed in summaries included in each section.Funded by the National Drug Law Enforcement Research Fund, an initiative of the National Drug StrategyAlan R Clough (PhD) School of Public Health, Tropical Medicine and Rehabilitation Sciences James Cook UniversityCharmaine S Hayes-Jonkers (BPsy, BSocSci (Hon1)) James Cook University, Cairns.Edward S Pointing (BPsych) James Cook University, Cairns

    A Taxonomy of Privacy-Preserving Record Linkage Techniques

    Get PDF
    The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research
    • …
    corecore