2,659 research outputs found

    Synthetic Establishment Microdata Around the World

    Get PDF
    In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business micro data is due to the skewed and sparse distributions that characterize business data. Synthetic data are simulated data generated from statistical models. We organized sessions at the 2015 World Statistical Congress and the 2015 Joint Statistical Meetings, highlighting work on synthetic \emph{establishment} microdata. This overview situates those papers, published in this issue, within the broader literature

    Remote processing of firm microdata at the Bank of Italy

    Get PDF
    Providing the possibility to run personalised econometric/statistical analyses on the appropriate data sets by remote processing allows greater flexibility in the production of economic information. Binding confidentiality requirements are required with business survey data. The Bank of Italy's infrastructure allows its business survey data to be exploited, while preserving anonymity of individual data. The system is based on the LISSY platform and has been already adopted by the Luxembourg Income Study (LIS) and other research centres. Firms' privacy is safeguarded by forbidding potentially confidentiality-breaking programme statements and by denying the visualisation of individual data. Data confidentiality is protected by removing key identifiers from the database and by trimming data in the right tail of the distribution. The platform provides its services through plain-text e-mails. The authorised user sends an e-mail containing an identifying header followed by a statistical programme to a predetermined address. The system checks the validity of the header, strips out the code and submits it in a batch to one of the econometric/statistical packages available (SAS and Stata). The outputs are mailed back to the user after passing an array of automatic and manual checks.microdata, confidentiality, remote access

    Avoiding disclosure of individually identifiable health information: a literature review

    Get PDF
    Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility

    Proceedings from the Synthetic LBD International Seminar

    Get PDF
    On May 9, 2017, we hosted a seminar to discuss the conditions necessary to im- plement the SynLBD approach with interested parties, with the goal of providing a straightforward toolkit to implement the same procedure on other data. The proceed- ings summarize the discussions during the workshop

    Administrative Transaction Data

    Get PDF
    The value of administrative transaction data, such as financial transactions, credit card purchases, telephone calls, and retail store scanning data, to study social behaviour has long been recognised. Now new types of transactions data made possible by advances in cyber-technology have the potential to further exland social scientists’ research frontier. This chapter discusses the potential for such data to be included in the scientific infrastructure. It discusses new approaches to data dissemination, as well as the privacy and confidentiality issues raised by such data collection. It also discusses the characteristics of an optimal infrastructure to support the scientific analysis of transactions data.transactions data; administrative data; cybertechnology; privacy and confidentiality; virtual organizations

    Data protection and statistics – a dynamic and tension-filled relationship

    Get PDF
    New statistical methods have been developed for the longer-term storage of microdata. These methods must comply, however, with the fundamental right to informational self-determination and the legal regulations imposed by the Federal Constitutional Court. Thus it is crucial to develop effective and coherent methods for protecting personal data collected for statistical purposes. Recent decisions by the Federal Constitutional Court are likely to result in the outlawing of comprehensive, permanent statistical compilations comprised of microdata from a wide range of sources and updated regularly. However, aside from such comprehensive methods, there are certainly other ways of using microdata that cannot be dismissed from the outset as violating constitutional legal norms. Internet access to statistical microdata is likely to take on increased importance for scientific research in the near future. Yet this would radically change the entire landscape of data protection: the vast amount of additional information now available on the Internet makes it almost impossible to judge whether individuals can be rendered identifiable. In view of this almost unlimited information, individual data can only be offered over the Internet if the absolute anonymity of the data can be guaranteed.Right to informational self-determination, census ruling of December 15, 1983, longer-term storage of microdata, primary statistics, secondary statistics, statistical confidentiality, absolute anonymisation, de facto anonymisation, additional information, pseudonymisation, personal data profiles.

    Studying Innovation in Businesses: New Research Possibilities

    Get PDF
    The rapid pace of globalization and technological change has created demand for more and better analysis to answer key policy questions about the role of businesses in innovation. This demand was codified into law in the America COMPETES Act. However, existing business datasets are not adequate to create an empirically based foundation for policy decisions. This paper argues that the existing IRS data infrastructure could be used in a number of ways to respond to the national imperative. It describes the legal framework within which such a response could take place, and outlines the organizational features that would be required to establish an IRS/researcher partnership. It concludes with a discussion of the role for the research policy community.Business microdata, innovation, confidentiality, researcher access, tax policy

    Access to and Documentation of Publicly Financed Survey Data

    Get PDF
    The topic of this paper is access to and documentation of survey data financed through public funds. We distinguish between four types of publicly financed survey data: (1) Academic survey data from the national or international research infrastructures; (2) data from DFG projects or similarly funded projects; (3) survey data collected in research projects funded by the Federal State and the Länder (Ressortforschung); (4) Population and Household surveys from national and international statistical agencies. For each of these types of data we describe the current situation and present recommendations for future developments.Survey data, data access, data documentation, data archive

    Balancing Access to Data And Privacy. A review of the issues and approaches for the future

    Get PDF
    Access to sensitive micro data should be provided using remote access data enclaves. These enclaves should be built to facilitate the productive, high-quality usage of microdata. In other words, they should support a collaborative environment that facilitates the development and exchange of knowledge about data among data producers and consumers. The experience of the physical and life sciences has shown that it is possible to develop a research community and a knowledge infrastructure around both research questions and the different types of data necessary to answer policy questions. In sum, establishing a virtual organization approach would provided the research community with the ability to move away from individual, or artisan, science, towards the more generally accepted community based approach. Enclave should include a number of features: metadata documentation capacity so that knowledge about data can be shared; capacity to add data so that the data infrastructure can be augmented; communication capacity, such as wikis, blogs and discussion groups so that knowledge about the data can be deepened and incentives for information sharing so that a community of practice can be built. The opportunity to transform micro-data based research through such a organizational infrastructure could potentially be as far-reaching as the changes that have taken place in the biological and astronomical sciences. It is, however, an open research question how such an organization should be established: whether the approach should be centralized or decentralized. Similarly, it is an open research question as to the appropriate metrics of success, and the best incentives to put in place to achieve success.Methodology for Collecting, Estimating, Organizing Microeconomic Data
    • …
    corecore