2,532 research outputs found

    The U.S. Census Bureau Adopts Differential Privacy

    Get PDF
    The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census. Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses had serious vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem. We designed a differentially private publication system that directly addressed these vulnerabilities while preserving the fitness for use of the core statistical products. Problem statement: Designing and engineering production differential privacy systems requires two primary components: (1) inventing and constructing algorithms that deliver maximum accuracy for a given privacy-loss budget and (2) insuring that the privacy-loss budget can be directly controlled by the policy-makers who must choose an appropriate point on the accuracy-privacy-loss tradeoff. The first problem lies in the domain of computer science. The second lies in the domain of economics. Approach: The algorithms under development for the 2020 Census focus on the data used to draw legislative districts and to enforce the 1965 Voting Rights Act (VRA). These algorithms efficiently distribute the noise injected by differential privacy. The Data Stewardship Executive Policy Committee selects the privacy-loss parameter after reviewing accuracy-privacy-loss graphs

    Science, Confidentiality, and the Public Interest

    Get PDF
    We describe the benefits of providing data to public agencies, and how public agencies navigate the narrow path between too much information disclosure on one hand, and the release of useful information on the other hand

    The NBER Immigration, Trade, and Labor Markets Data Files

    Get PDF
    The NEER Immigration, Trade, and Labor Markets Data Files were developed from public data sources to facilitate industry-based and area-based research on the effects of international trade and immigration on labor markets in the United States. The industry data files contain shipments, a shipments deflator, value added, employment, payroll, hours, real capital stock, imports, exports, unionization, and immigrant ratios for 450 four-digit (1972 Standard Industrial Classification) manufacturing industries. The primary source of the industry production and factor use data is the Annual Survey of Manufactures. The primary source of the international trade data is the defunct BLS Trade Monitoring System (1972 to 1981). which was extended to earlier and later years using U.S. Commodity Exports and Imports as Related to Output, U.S. Department of Commerce Official Statistics, and the Annual Survey of Manufactures. The primary source of the unionization data is the Current Population Survey (1973 to 1984), which cannot be extended to earlier years. The primary source of the immigrant ratio data is the Census of Population (1960, 1970, and 1980). The area data files contain information on immigrants in the work force by state and major SMSA from the Census of Population 1970 and 1980. The data are available fro. the author on floppy disk (Stata or ASCII format), computer tape (SAS format) or by electronic mail.

    Testimony of John M. Abowd Before the House Committee on Energy and Commerce, Subcommitte on Commerce, Manufacturing and Trade, United States House of Representatives

    Get PDF
    We focus attention on gross flows in the labor market and their role in economic reallocation. Economists distinguish between movements of individuals (gross worker flows) and those associated with businesses (gross job flows). The gross worker flows are accessions (hiring and recalls) and separations (quits, layoffs, retirements, and firings). The gross job flows are creations (increases in the employment of a given business establishment) and destructions (decreases in employment of a given business establishments). In our testimony, we discuss the different flows and the regional variation therein over the last recession

    Presentation: Did the Housing Price Bubble Clobber Local Labor Market Job and Worker Flows When It Burst?

    Get PDF
    We integrate local labor market data on worker flows, job flows, employment levels, and earnings with MSA-level data on housing prices and local area unemployment, to study the local labor market dynamics associated with the U.S. housing price bubble of the late 2000s. We proceed to study the magnitude and timing of the relation between the changes in local housing prices and local worker and job flows, and local labor market earnings.In addition to the unique contribution of using both local labor and housing market data, the paper also considers the contributions of the aggregate movements in the worker and job flows to the heterogeneous local labor market outcomes

    Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods

    Get PDF
    This paper has been replaced with http://digitalcommons.ilr.cornell.edu/ldi/37. We consider the problem of the public release of statistical information about a population–explicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social planner’s problem using the technology set implied by (ε, δ)-differential privacy with (α, β)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social planner’s problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial

    A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data

    Get PDF
    Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

    Noise Infusion as a Confidentiality Protection Measure for Graph-Based Statistics

    Get PDF
    We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightforward extension of the dynamic noise-infusion method used in the U.S. Census Bureau’s Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs
    • …
    corecore