15 research outputs found
Heavy Hitters and the Structure of Local Privacy
We present a new locally differentially private algorithm for the heavy
hitters problem which achieves optimal worst-case error as a function of all
standardly considered parameters. Prior work obtained error rates which depend
optimally on the number of users, the size of the domain, and the privacy
parameter, but depend sub-optimally on the failure probability.
We strengthen existing lower bounds on the error to incorporate the failure
probability, and show that our new upper bound is tight with respect to this
parameter as well. Our lower bound is based on a new understanding of the
structure of locally private protocols. We further develop these ideas to
obtain the following general results beyond heavy hitters.
Advanced Grouposition: In the local model, group privacy for
users degrades proportionally to , instead of linearly in
as in the central model. Stronger group privacy yields improved max-information
guarantees, as well as stronger lower bounds (via "packing arguments"), over
the central model.
Building on a transformation of Bassily and Smith (STOC 2015), we
give a generic transformation from any non-interactive approximate-private
local protocol into a pure-private local protocol. Again in contrast with the
central model, this shows that we cannot obtain more accurate algorithms by
moving from pure to approximate local privacy
Differentially Private Heatmaps
We consider the task of producing heatmaps from users' aggregated data while
protecting their privacy. We give a differentially private (DP) algorithm for
this task and demonstrate its advantages over previous algorithms on real-world
datasets.
Our core algorithmic primitive is a DP procedure that takes in a set of
distributions and produces an output that is close in Earth Mover's Distance to
the average of the inputs. We prove theoretical bounds on the error of our
algorithm under a certain sparsity assumption and that these are near-optimal.Comment: To appear in AAAI 202
Unbounded Differentially Private Quantile and Maximum Estimation
In this work we consider the problem of differentially private computation of
quantiles for the data, especially the highest quantiles such as maximum, but
with an unbounded range for the dataset. We show that this can be done
efficiently through a simple invocation of , a
subroutine that is iteratively called in the fundamental Sparse Vector
Technique, even when there is no upper bound on the data. In particular, we
show that this procedure can give more accurate and robust estimates on the
highest quantiles with applications towards clipping that is essential for
differentially private sum and mean estimation. In addition, we show how two
invocations can handle the fully unbounded data setting. Within our study, we
show that an improved analysis of can improve the
privacy guarantees for the widely used Sparse Vector Technique that is of
independent interest. We give a more general characterization of privacy loss
for which we immediately apply to our method for
improved privacy guarantees. Our algorithm only requires one pass
through the data, which can be unsorted, and each subsequent query takes
time. We empirically compare our unbounded algorithm with the state-of-the-art
algorithms in the bounded setting. For inner quantiles, we find that our method
often performs better on non-synthetic datasets. For the maximal quantiles,
which we apply to differentially private sum computation, we find that our
method performs significantly better
Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy
In this paper, we study the Empirical Risk Minimization (ERM) problem in the
non-interactive Local Differential Privacy (LDP) model. Previous research on
this problem \citep{smith2017interaction} indicates that the sample complexity,
to achieve error , needs to be exponentially depending on the
dimensionality for general loss functions. In this paper, we make two
attempts to resolve this issue by investigating conditions on the loss
functions that allow us to remove such a limit. In our first attempt, we show
that if the loss function is -smooth, by using the Bernstein
polynomial approximation we can avoid the exponential dependency in the term of
. We then propose player-efficient algorithms with -bit
communication complexity and computation cost for each player. The error
bound of these algorithms is asymptotically the same as the original one. With
some additional assumptions, we also give an algorithm which is more efficient
for the server. In our second attempt, we show that for any -Lipschitz
generalized linear convex loss function, there is an -LDP
algorithm whose sample complexity for achieving error is only linear
in the dimensionality . Our results use a polynomial of inner product
approximation technique. Finally, motivated by the idea of using polynomial
approximation and based on different types of polynomial approximations, we
propose (efficient) non-interactive locally differentially private algorithms
for learning the set of k-way marginal queries and the set of smooth queries.Comment: Appeared at Journal of Machine Learning Research. The journal version
of arXiv:1802.04085, fixed a bug in arXiv:1812.0682