53 research outputs found
Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data
In this paper, we study the problem of estimating smooth Generalized Linear
Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model.
Different from its classical setting, our model allows the server to access
some additional public but unlabeled data. By using Stein's lemma and its
variants, we first show that there is an -NLDP algorithm
for GLM (under some mild assumptions), if each data record is i.i.d sampled
from some sub-Gaussian distribution with bounded -norm. Then with high
probability, the sample complexity of the public and private data, for the
algorithm to achieve an estimation error (in -norm), is
and , respectively, if
is not too small ({\em i.e.,} ), where is the dimensionality of the data. This
is a significant improvement over the previously known quasi-polynomial (in
) or exponential (in ) complexity of GLM with no public data. Also,
our algorithm can answer multiple (at most ) GLM queries with the
same sample complexities as in the one GLM query case with at least constant
probability. We then extend our idea to the non-linear regression problem and
show a similar phenomenon for it. Finally, we demonstrate the effectiveness of
our algorithms through experiments on both synthetic and real world datasets.
To our best knowledge, this is the first paper showing the existence of
efficient and effective algorithms for GLM and non-linear regression in the
NLDP model with public unlabeled data
Estimating smooth GLM in non-interactive local differential privacy model with public unlabeled data
In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. Firstly, motived by Stein’s lemma, we show that if each data record is i.i.d. sampled from zero-mean Gaussian distribution, we show that there exists an (, )-NLDP algorithm for GLM. The sample complexity of the public and private data, for the algorithm to achieve an estimation error (in _2-norm) with high probability, is O(p^-2) and O(p^3^-2^-2), respectively. This is a significant improvement over the previously known exponential or quasi-polynomial in ^-1, or exponential in p sample complexity of GLM with no public data. Then, by a variant of Stein’s lemma, we show that there is an (, )-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded _1-norm. Then the sample complexity of the public and private data, for the algorithm to achieve an estimation error (in ∞-norm) with high probability, is O(p^2^-2) and O(p^2^-2^-2), respectively, if is not too small (i.e., ≥ Ω (1/√p )), where p is the dimensionality of the data. We also extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data.http://proceedings.mlr.press/v132/wang21a/wang21a.pd
On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data
In this paper, we study the problem of PAC learning halfspaces in the
non-interactive local differential privacy model (NLDP). To breach the barrier
of exponential sample complexity, previous results studied a relaxed setting
where the server has access to some additional public but unlabeled data. We
continue in this direction. Specifically, we consider the problem under the
standard setting instead of the large margin setting studied before. Under
different mild assumptions on the underlying data distribution, we propose two
approaches that are based on the Massart noise model and self-supervised
learning and show that it is possible to achieve sample complexities that are
only linear in the dimension and polynomial in other terms for both private and
public data, which significantly improve the previous results. Our methods
could also be used for other private PAC learning problems.Comment: To appear in The 14th Asian Conference on Machine Learning (ACML
2022
Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data
We study stochastic convex optimization with heavy-tailed data under the
constraint of differential privacy (DP). Most prior work on this problem is
restricted to the case where the loss function is Lipschitz. Instead, as
introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general
convex loss functions with the assumption that the distribution of gradients
has bounded -th moments. We provide improved upper bounds on the excess
population risk under concentrated DP for convex and strongly convex loss
functions. Along the way, we derive new algorithms for private mean estimation
of heavy-tailed distributions, under both pure and concentrated DP. Finally, we
prove nearly-matching lower bounds for private stochastic convex optimization
with strongly convex losses and mean estimation, showing new separations
between pure and concentrated DP
Efficient Private SCO for Heavy-Tailed Data via Clipping
We consider stochastic convex optimization for heavy-tailed data with the
guarantee of being differentially private (DP). Prior work on this problem is
restricted to the gradient descent (GD) method, which is inefficient for
large-scale problems. In this paper, we resolve this issue and derive the first
high-probability bounds for the private stochastic method with clipping. For
general convex problems, we derive excess population risks
\Tilde{O}\left(\frac{d^{1/7}\sqrt{\ln\frac{(n \epsilon)^2}{\beta
d}}}{(n\epsilon)^{2/7}}\right) and
\Tilde{O}\left(\frac{d^{1/7}\ln\frac{(n\epsilon)^2}{\beta
d}}{(n\epsilon)^{2/7}}\right) under bounded or unbounded domain assumption,
respectively (here is the sample size, is the dimension of the data,
is the confidence level and is the private level). Then, we
extend our analysis to the strongly convex case and non-smooth case (which
works for generalized smooth objectives with Hlder-continuous
gradients). We establish new excess risk bounds without bounded domain
assumption. The results above achieve lower excess risks and gradient
complexities than existing methods in their corresponding cases. Numerical
experiments are conducted to justify the theoretical improvement
Practical Differentially Private and Byzantine-resilient Federated Learning
Privacy and Byzantine resilience are two indispensable requirements for a
federated learning (FL) system. Although there have been extensive studies on
privacy and Byzantine security in their own track, solutions that consider both
remain sparse. This is due to difficulties in reconciling privacy-preserving
and Byzantine-resilient algorithms.
In this work, we propose a solution to such a two-fold issue. We use our
version of differentially private stochastic gradient descent (DP-SGD)
algorithm to preserve privacy and then apply our Byzantine-resilient
algorithms. We note that while existing works follow this general approach, an
in-depth analysis on the interplay between DP and Byzantine resilience has been
ignored, leading to unsatisfactory performance. Specifically, for the random
noise introduced by DP, previous works strive to reduce its impact on the
Byzantine aggregation. In contrast, we leverage the random noise to construct
an aggregation that effectively rejects many existing Byzantine attacks.
We provide both theoretical proof and empirical experiments to show our
protocol is effective: retaining high accuracy while preserving the DP
guarantee and Byzantine resilience. Compared with the previous work, our
protocol 1) achieves significantly higher accuracy even in a high privacy
regime; 2) works well even when up to 90% of distributive workers are
Byzantine
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
- …