9,623 research outputs found
On the Risks of Collecting Multidimensional Data Under Local Differential Privacy
The private collection of multiple statistics from a population is a
fundamental statistical problem. One possible approach to realize this is to
rely on the local model of differential privacy (LDP). Numerous LDP protocols
have been developed for the task of frequency estimation of single and multiple
attributes. These studies mainly focused on improving the utility of the
algorithms to ensure the server performs the estimations accurately. In this
paper, we investigate privacy threats (re-identification and attribute
inference attacks) against LDP protocols for multidimensional data following
two state-of-the-art solutions for frequency estimation of multiple attributes.
To broaden the scope of our study, we have also experimentally assessed five
widely used LDP protocols, namely, generalized randomized response, optimal
local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we
also proposed a countermeasure that improves both utility and robustness
against the identified threats. Our contributions can help practitioners aiming
to collect users' statistics privately to decide which LDP mechanism best fits
their needs.Comment: Accepted at VLDB 202
Marginal Release Under Local Differential Privacy
Many analysis and machine learning tasks require the availability of marginal
statistics on multidimensional datasets while providing strong privacy
guarantees for the data subjects. Applications for these statistics range from
finding correlations in the data to fitting sophisticated prediction models. In
this paper, we provide a set of algorithms for materializing marginal
statistics under the strong model of local differential privacy. We prove the
first tight theoretical bounds on the accuracy of marginals compiled under each
approach, perform empirical evaluation to confirm these bounds, and evaluate
them for tasks such as modeling and correlation testing. Our results show that
releasing information based on (local) Fourier transformations of the input is
preferable to alternatives based directly on (local) marginals
- …