17 research outputs found
Not All Learnable Distribution Classes are Privately Learnable
We give an example of a class of distributions that is learnable in total
variation distance with a finite number of samples, but not learnable under
-differential privacy. This refutes a conjecture of
Ashtiani.Comment: To appear in ALT 2024. Added a minor clarification to the
construction and an acknowledgement of the Fields Institut
A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions
We present the first -differentially private, computationally
efficient algorithm that estimates the means of product distributions over
accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times
Better and Simpler Lower Bounds for Differentially Private Statistical Estimation
We provide improved lower bounds for two well-known high-dimensional private
estimation tasks. First, we prove that for estimating the covariance of a
Gaussian up to spectral error with approximate differential privacy,
one needs samples for any , which is tight up
to logarithmic factors. This improves over previous work which established this
for , and is also simpler than
previous work. Next, we prove that for estimating the mean of a heavy-tailed
distribution with bounded th moments with approximate differential privacy,
one needs samples. This matches known upper bounds and
improves over the best known lower bound for this problem, which only hold for
pure differential privacy, or when . Our techniques follow the method of
fingerprinting and are generally quite simple. Our lower bound for heavy-tailed
estimation is based on a black-box reduction from privately estimating
identity-covariance Gaussians. Our lower bound for covariance estimation
utilizes a Bayesian approach to show that, under an Inverse Wishart prior
distribution for the covariance matrix, no private estimator can be accurate
even in expectation, without sufficiently many samples.Comment: 23 page
Privacy Preserving Adaptive Experiment Design
Adaptive experiment is widely adopted to estimate conditional average
treatment effect (CATE) in clinical trials and many other scenarios. While the
primary goal in experiment is to maximize estimation accuracy, due to the
imperative of social welfare, it's also crucial to provide treatment with
superior outcomes to patients, which is measured by regret in contextual bandit
framework. These two objectives often lead to contrast optimal allocation
mechanism. Furthermore, privacy concerns arise in clinical scenarios containing
sensitive data like patients health records. Therefore, it's essential for the
treatment allocation mechanism to incorporate robust privacy protection
measures. In this paper, we investigate the tradeoff between loss of social
welfare and statistical power in contextual bandit experiment. We propose a
matched upper and lower bound for the multi-objective optimization problem, and
then adopt the concept of Pareto optimality to mathematically characterize the
optimality condition. Furthermore, we propose differentially private algorithms
which still matches the lower bound, showing that privacy is "almost free".
Additionally, we derive the asymptotic normality of the estimator, which is
essential in statistical inference and hypothesis testing.Comment: Add a tabl
CoinPress: Practical Private Mean and Covariance Estimation
We present simple differentially private estimators for the mean and
covariance of multivariate sub-Gaussian data that are accurate at small sample
sizes. We demonstrate the effectiveness of our algorithms both theoretically
and empirically using synthetic and real-world datasets---showing that their
asymptotic error rates match the state-of-the-art theoretical bounds, and that
they concretely outperform all previous methods. Specifically, previous
estimators either have weak empirical accuracy at small sample sizes, perform
poorly for multivariate data, or require the user to provide strong a priori
estimates for the parameters.Comment: Code is available at https://github.com/twistedcubic/coin-pres