7 research outputs found
Analyzing the Differentially Private Theil-Sen Estimator for Simple Linear Regression
In this paper, we study differentially private point and confidence interval
estimators for simple linear regression. Motivated by recent work that
highlights the strong empirical performance of an algorithm based on robust
statistics, DPTheilSen, we provide a rigorous, finite-sample analysis of its
privacy and accuracy properties, offer guidance on setting hyperparameters, and
show how to produce differentially private confidence intervals to accompany
its point estimates.Comment: Extended abstract presented at the 2021 workshop on Theory and
Practice of Differential Privac
Differentially private simple linear regression
Economics and social science research often
require analyzing datasets of sensitive personal information
at fine granularity, with models fit to small subsets
of the data. Unfortunately, such fine-grained analysis
can easily reveal sensitive individual information. We
study regression algorithms that satisfy differential privacy,
a constraint which guarantees that an algorithm’s
output reveals little about any individual input data
record, even to an attacker with side information about
the dataset. Motivated by the Opportunity Atlas, a highprofile,
small-area analysis tool in economics research,
we perform a thorough experimental evaluation of differentially
private algorithms for simple linear regression
on small datasets with tens to hundreds of records—a
particularly challenging regime for differential privacy.
In contrast, prior work on differentially private linear
regression focused on multivariate linear regression on
large datasets or asymptotic analysis. Through a range
of experiments, we identify key factors that affect the
relative performance of the algorithms. We find that algorithms
based on robust estimators—in particular, the
median-based estimator of Theil and Sen—perform best
on small datasets (e.g., hundreds of datapoints), while
algorithms based on Ordinary Least Squares or Gradient
Descent perform better for large datasets. However,
we also discuss regimes in which this general finding
does not hold. Notably, the differentially private analogues
of Theil–Sen (one of which was suggested in a
theoretical work of Dwork and Lei) have not been studied
in any prior experimental work on differentially private
linear regression.Published versio
Controlling privacy loss in sampling schemes: an analysis of stratified and cluster sampling
Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designsCB20ADR0160001 - U.S. Census Bureauhttps://drops.dagstuhl.de/opus/volltexte/2022/16524/pdf/LIPIcs-FORC-2022-1.pdfPublished versio
Non-parametric differentially private confidence intervals for the median
https://arxiv.org/abs/2106.1033
Controlling Privacy Loss in Sampling Schemes: An Analysis of Stratified and Cluster Sampling
Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs