494 research outputs found
The U.S. Census Bureau Adopts Differential Privacy
The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census. Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses had serious vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem. We designed a differentially private publication system that directly addressed these vulnerabilities while preserving the fitness for use of the core statistical products. Problem statement: Designing and engineering production differential privacy systems requires two primary components: (1) inventing and constructing algorithms that deliver maximum accuracy for a given privacy-loss budget and (2) insuring that the privacy-loss budget can be directly controlled by the policy-makers who must choose an appropriate point on the accuracy-privacy-loss tradeoff. The first problem lies in the domain of computer science. The second lies in the domain of economics. Approach: The algorithms under development for the 2020 Census focus on the data used to draw legislative districts and to enforce the 1965 Voting Rights Act (VRA). These algorithms efficiently distribute the noise injected by differential privacy. The Data Stewardship Executive Policy Committee selects the privacy-loss parameter after reviewing accuracy-privacy-loss graphs
The NBER Immigration, Trade, and Labor Markets Data Files
The NEER Immigration, Trade, and Labor Markets Data Files were developed from public data sources to facilitate industry-based and area-based research on the effects of international trade and immigration on labor markets in the United States. The industry data files contain shipments, a shipments deflator, value added, employment, payroll, hours, real capital stock, imports, exports, unionization, and immigrant ratios for 450 four-digit (1972 Standard Industrial Classification) manufacturing industries. The primary source of the industry production and factor use data is the Annual Survey of Manufactures. The primary source of the international trade data is the defunct BLS Trade Monitoring System (1972 to 1981). which was extended to earlier and later years using U.S. Commodity Exports and Imports as Related to Output, U.S. Department of Commerce Official Statistics, and the Annual Survey of Manufactures. The primary source of the unionization data is the Current Population Survey (1973 to 1984), which cannot be extended to earlier years. The primary source of the immigrant ratio data is the Census of Population (1960, 1970, and 1980). The area data files contain information on immigrants in the work force by state and major SMSA from the Census of Population 1970 and 1980. The data are available fro. the author on floppy disk (Stata or ASCII format), computer tape (SAS format) or by electronic mail.
Presentation: Did the Housing Price Bubble Clobber Local Labor Market Job and Worker Flows When It Burst?
We integrate local labor market data on worker flows, job flows, employment levels, and earnings with MSA-level data on housing prices and local area unemployment, to study the local labor market dynamics associated with the U.S. housing price bubble of the late 2000s. We proceed to study the magnitude and timing of the relation between the changes in local housing prices and local worker and job flows, and local labor market earnings.In addition to the unique contribution of using both local labor and housing market data, the paper also considers the contributions of the aggregate movements in the worker and job flows to the heterogeneous local labor market outcomes
Testimony of John M. Abowd Before the House Committee on Energy and Commerce, Subcommitte on Commerce, Manufacturing and Trade, United States House of Representatives
We focus attention on gross flows in the labor market and their role in economic reallocation. Economists distinguish between movements of individuals (gross worker flows) and those associated with businesses (gross job flows). The gross worker flows are accessions (hiring and recalls) and separations (quits, layoffs, retirements, and firings). The gross job flows are creations (increases in the employment of a given business establishment) and destructions (decreases in employment of a given business establishments). In our testimony, we discuss the different flows and the regional variation therein over the last recession
Revisiting the Economics of Privacy: Population Statistics and Confidentiality Protection as Public Goods
This paper has been replaced with http://digitalcommons.ilr.cornell.edu/ldi/37.
We consider the problem of the public release of statistical information about a populationâexplicitly accounting for the public-good properties of both data accuracy and privacy loss. We first consider the implications of adding the public-good component to recently published models of private data publication under differential privacy guarantees using a Vickery-Clark-Groves mechanism and a Lindahl mechanism. We show that data quality will be inefficiently under-supplied. Next, we develop a standard social plannerâs problem using the technology set implied by (Δ, ÎŽ)-differential privacy with (α, ÎČ)-accuracy for the Private Multiplicative Weights query release mechanism to study the properties of optimal provision of data accuracy and privacy loss when both are public goods. Using the production possibilities frontier implied by this technology, explicitly parameterized interdependent preferences, and the social welfare function, we display properties of the solution to the social plannerâs problem. Our results directly quantify the optimal choice of data accuracy and privacy loss as functions of the technology and preference parameters. Some of these properties can be quantified using population statistics on marginal preferences and correlations between income, data accuracy preferences, and privacy loss preferences that are available from survey data. Our results show that government data custodians should publish more accurate statistics with weaker privacy guarantees than would occur with purely private data publishing. Our statistical results using the General Social Survey and the Cornell National Social Survey indicate that the welfare losses from under-providing data accuracy while over-providing privacy protection can be substantial
An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy
Economic Analysis and Statistical Disclosure Limitation
This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignorable. When it is not ignorable, we consider the effects of statistical disclosure limitation for a variety of research designs common in applied economic research. Because statistical agencies do not always report the methods they use to protect conïŹdentiality, we also characterize settings in which statistical disclosure limitation methods are discoverable; that is, they can be learned from the released data. We conclude with advice for researchers, journal editors, and statistical agencies
A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureauâs Quarterly Workforce Indicators
- âŠ