14 research outputs found
Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data
We study stochastic convex optimization with heavy-tailed data under the
constraint of differential privacy (DP). Most prior work on this problem is
restricted to the case where the loss function is Lipschitz. Instead, as
introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general
convex loss functions with the assumption that the distribution of gradients
has bounded -th moments. We provide improved upper bounds on the excess
population risk under concentrated DP for convex and strongly convex loss
functions. Along the way, we derive new algorithms for private mean estimation
of heavy-tailed distributions, under both pure and concentrated DP. Finally, we
prove nearly-matching lower bounds for private stochastic convex optimization
with strongly convex losses and mean estimation, showing new separations
between pure and concentrated DP
Mean estimation in the add-remove model of differential privacy
Differential privacy is often studied under two different models of
neighboring datasets: the add-remove model and the swap model. While the swap
model is frequently used in the academic literature to simplify analysis, many
practical applications rely on the more conservative add-remove model, where
obtaining tight results can be difficult. Here, we study the problem of
one-dimensional mean estimation under the add-remove model. We propose a new
algorithm and show that it is min-max optimal, achieving the best possible
constant in the leading term of the mean squared error for all , and
that this constant is the same as the optimal algorithm under the swap model.
These results show that the add-remove and swap models give nearly identical
errors for mean estimation, even though the add-remove model cannot treat the
size of the dataset as public information. We also demonstrate empirically that
our proposed algorithm yields at least a factor of two improvement in mean
squared error over algorithms frequently used in practice. One of our main
technical contributions is a new hour-glass mechanism, which might be of
independent interest in other scenarios
Efficient Private SCO for Heavy-Tailed Data via Clipping
We consider stochastic convex optimization for heavy-tailed data with the
guarantee of being differentially private (DP). Prior work on this problem is
restricted to the gradient descent (GD) method, which is inefficient for
large-scale problems. In this paper, we resolve this issue and derive the first
high-probability bounds for the private stochastic method with clipping. For
general convex problems, we derive excess population risks
\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta
d}}}{(n\epsilon)^{2/7}}\right) and
\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta
d}}{(n\epsilon)^{2/7}}\right) under bounded or unbounded domain assumption,
respectively (here is the sample size, is the dimension of the data,
is the confidence level and is the private level). Then, we
extend our analysis to the strongly convex case and non-smooth case (which
works for generalized smooth objectives with Hlder-continuous
gradients). We establish new excess risk bounds without bounded domain
assumption. The results above achieve lower excess risks and gradient
complexities than existing methods in their corresponding cases. Numerical
experiments are conducted to justify the theoretical improvement
Instance-Specific Asymmetric Sensitivity in Differential Privacy
We provide a new algorithmic framework for differentially private estimation
of general functions that adapts to the hardness of the underlying dataset. We
build upon previous work that gives a paradigm for selecting an output through
the exponential mechanism based upon closeness of the inverse to the underlying
dataset, termed the inverse sensitivity mechanism. Our framework will slightly
modify the closeness metric and instead give a simple and efficient application
of the sparse vector technique. While the inverse sensitivity mechanism was
shown to be instance optimal, it was only with respect to a class of unbiased
mechanisms such that the most likely outcome matches the underlying data. We
break this assumption in order to more naturally navigate the bias-variance
tradeoff, which will also critically allow for extending our method to
unbounded data. In consideration of this tradeoff, we provide strong intuition
and empirical validation that our technique will be particularly effective when
the distances to the underlying dataset are asymmetric. This asymmetry is
inherent to a range of important problems including fundamental statistics such
as variance, as well as commonly used machine learning performance metrics for
both classification and regression tasks. We efficiently instantiate our method
in time for these problems and empirically show that our techniques will
give substantially improved differentially private estimations
CoinPress: Practical Private Mean and Covariance Estimation
We present simple differentially private estimators for the mean and
covariance of multivariate sub-Gaussian data that are accurate at small sample
sizes. We demonstrate the effectiveness of our algorithms both theoretically
and empirically using synthetic and real-world datasets---showing that their
asymptotic error rates match the state-of-the-art theoretical bounds, and that
they concretely outperform all previous methods. Specifically, previous
estimators either have weak empirical accuracy at small sample sizes, perform
poorly for multivariate data, or require the user to provide strong a priori
estimates for the parameters.Comment: Code is available at https://github.com/twistedcubic/coin-pres