2,148 research outputs found

    Distribution-Preserving Statistical Disclosure Limitation

    Get PDF
    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.statistical disclosure limitation; confidentiality; privacy; multiple imputation; partially synthetic data

    USING LINKED EMPLOYER-EMPLOYEE DATA TO UNDERSTAND LABOR MARKETS AND IMPROVE DATA PRODUCTS

    Get PDF
    This thesis is comprised of three chapters. The first chapter (joint with John Haltiwanger, Julia Lane, and Kevin McKinney) explores a new way of capturing dynamics: following clusters of workers as they move across administrative entities. Information on firm dynamics is critical to understanding economic activity, yet fundamentally difficult to measure. The worker flow approach is shown to improve linkages across firms in longitudinal business databases. The approach also provides conceptual insights into the changing structure of businesses and employer-employee relationships. Many worker-cluster flows involve changes in industry -- particularly movements into and out of personnel supply firms. Another finding, that a nontrivial fraction of firm entry is associated with such flows, suggests that a path for firm entry is a group of workers at an existing firm starting a new firm. The second chapter makes use of linked employer-employee data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program and matches it to data on business acquisitions from the Federal Trade Commission to examine labor market outcomes of employees at firms undergoing mergers. Earnings and employment can be observed over time for workers at both the acquired firm and the acquiring firm. The findings suggest that while wages tend to be about the same or higher for workers at these restructuring firms, turnover is significantly higher, and the costs of job-loss are large and long lasting. The third chapter (joint with John Abowd and Martha Stinson) provides technical documentation for a project undertaken by the US Census Bureau, the Social Security Administration, and the Internal Revenue Service to explore a potential method of providing the public a valuable new dataset without compromising confidentiality. The underlying database was created by merging the respondents from the Census' own SIPP with administrative data on earnings and benefits from the IRS and SSA. The administrative variables combined with the detailed survey responses from the SIPP offer the potential to do interesting research especially in the areas of retirement, benefits, and lifetime earnings; however, they also add extensive new information for malicious data users to potentially reidentify SIPP respondents. This final chapter develops a cutting edge new technique for providing a micro-dataset that looks, in structure, just like the underlying confidential data. This "partially synthetic" database aims to preserve as many of the complex covariate relationships in the confidential data without posing any significant new risk to disclosure protection

    Partially-synthetic linked employer-employee data

    Full text link
    The workshop was held at the Census Bureau’s Headquarters in Suitland, MD.A workshop that brought together university-based researchers, members of the international official statistical community, and other interested user communities was held on July 31, 2009, the Friday before the 2009 Joint Statistical Meetings in Washington DC at the U.S. Census Bureau's Headquarters building in Suitland, MD. The purpose of the workshop was to discuss and critique newly created public-use micro-data files that are based on the concepts of “synthetic data” and “partially synthetic data.”Funding for the conference and its preparation were provided by National Science Foundation (NSF) Grant SES-0922494, the U.S. Census Bureau's Center for Economic Studies, the Internal Revenue Service (IRS), and the Edmund Ezra Day Professorship at Cornell University

    Distribution-Preserving Statistical Disclosure Limitation

    Get PDF
    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database

    Distribution-Preserving Statistical Disclosure Limitation

    Get PDF
    One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database

    Summary of Methods and Preliminary Assessment of the SIPP Synthetic Beta

    Full text link
    The workshop was held at the Census Bureau’s Headquarters in Suitland, MD.A workshop that brought together university-based researchers, members of the international official statistical community, and other interested user communities was held on July 31, 2009, the Friday before the 2009 Joint Statistical Meetings in Washington DC at the U.S. Census Bureau's Headquarters building in Suitland, MD. The purpose of the workshop was to discuss and critique newly created public-use micro-data files that are based on the concepts of “synthetic data” and “partially synthetic data.”Funding for the conference and its preparation were provided by National Science Foundation (NSF) Grant SES-0922494, the U.S. Census Bureau's Center for Economic Studies, the Internal Revenue Service (IRS), and the Edmund Ezra Day Professorship at Cornell University

    Ethical considerations regarding animal experimentation

    Get PDF
    Animal experimentation is widely used around the world for the identification of the root causes of various diseases in humans and animals and for exploring treatment options. Among the several animal species, rats, mice and purpose-bred birds comprise almost 90% of the animals that are used for research purpose. However, growing awareness of the sentience of animals and their experience of pain and suffering has led to strong opposition to animal research among many scientists and the general public. In addition, the usefulness of extrapolating animal data to humans has been questioned. This has led to Ethical Committees’ adoption of the ‘four Rs’ principles (Reduction, Refinement, Replacement and Responsibility) as a guide when making decisions regarding animal experimentation. Some of the essential considerations for humane animal experimentation are presented in this review along with the requirement for investigator training. Due to the ethical issues surrounding the use of animals in experimentation, their use is declining in those research areas where alternative in vitro or in silico methods are available. However, so far it has not been possible to dispense with experimental animals completely and further research is needed to provide a road map to robust alternatives before their use can be fully discontinued

    Methodology for clinical research

    Get PDF
    A clinical research requires a systematic approach with diligent planning, execution and sampling in order to obtain reliable and validated results, as well as an understanding of each research methodology is essential for researchers. Indeed, selecting an inappropriate study type, an error that cannot be corrected after the beginning of a study, results in flawed methodology. The results of clinical research studies enhance the repertoire of knowledge regarding a disease pathogenicity, an existing or newly discovered medication, surgical or diagnostic procedure or medical device. Medical research can be divided into primary and secondary research, where primary research involves conducting studies and collecting raw data, which is then analysed and evaluated in secondary research. The successful deployment of clinical research methodology depends upon several factors. These include the type of study, the objectives, the population, study design, methodology/techniques and the sampling and statistical procedures used. Among the different types of clinical studies, we can recognize descriptive or analytical studies, which can be further categorized in observational and experimental. Finally, also pre-clinical studies are of outmost importance, representing the steppingstone of clinical trials. It is therefore important to understand the types of method for clinical research. Thus, this review focused on various aspects of the methodology and describes the crucial steps of the conceptual and executive stages

    Targeting a Newly Established Spontaneous Feline Fibrosarcoma Cell Line by Gene Transfer

    Get PDF
    Fibrosarcoma is a deadly disease in cats and is significantly more often located at classical vaccine injections sites. More rare forms of spontaneous non-vaccination site (NSV) fibrosarcomas have been described and have been found associated to genetic alterations. Purpose of this study was to compare the efficacy of adenoviral gene transfer in NVS fibrosarcoma. We isolated and characterized a NVS fibrosarcoma cell line (Cocca-6A) from a spontaneous fibrosarcoma that occurred in a domestic calico cat. The feline cells were karyotyped and their chromosome number was counted using a Giemsa staining. Adenoviral gene transfer was verified by western blot analysis. Flow cytometry assay and Annexin-V were used to study cell-cycle changes and cell death of transduced cells. Cocca-6A fibrosarcoma cells were morphologically and cytogenetically characterized. Giemsa block staining of metaphase spreads of the Cocca-6A cells showed deletion of one of the E1 chromosomes, where feline p53 maps. Semi-quantitative PCR demonstrated reduction of p53 genomic DNA in the Cocca-6A cells. Adenoviral gene transfer determined a remarkable effect on the viability and growth of the Cocca-6A cells following single transduction with adenoviruses carrying Mda-7/IL-24 or IFN-Îł or various combination of RB/p105, Ras-DN, IFN-Îł, and Mda-7 gene transfer. Therapy for feline fibrosarcomas is often insufficient for long lasting tumor eradication. More gene transfer studies should be conducted in order to understand if these viral vectors could be applicable regardless the origin (spontaneous vs. vaccine induced) of feline fibrosarcomas
    • 

    corecore