Search CORE

6,118 research outputs found

Optimal locally private estimation under $\ell_p$ loss for $1\le p\le 2$

Author: Barg Alexander
Ye Min
Publication venue
Publication date: 16/10/2018
Field of study

We consider the minimax estimation problem of a discrete distribution with support size

k

under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number

\epsilon

measures the privacy level of a privatization scheme. In our previous work (IEEE Trans. Inform. Theory, 2018), we proposed a family of new privatization schemes and the corresponding estimator. We also proved that our scheme and estimator are order optimal in the regime

e^{\epsilon} \ll k

under both

\ell_2^2

(mean square) and

\ell_1

loss. In this paper, we sharpen this result by showing asymptotic optimality of the proposed scheme under the

\ell_p^p

loss for all

1\le p\le 2.

More precisely, we show that for any

p\in[1,2]

and any

k

and

\epsilon,

the ratio between the worst-case

\ell_p^p

estimation loss of our scheme and the optimal value approaches

1

as the number of samples tends to infinity. The lower bound on the minimax risk of private estimation that we establish as a part of the proof is valid for any loss function

\ell_p^p, p\ge 1.

Comment: This paper generalizes the optimality results of the preprint arXiv:1708.00059 from

ell_2

to a broader class of loss functions. The new approach taken here also results in a much shorter proo

arXiv.org e-Print Archive

Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters

Author: Acharya Jayadev
Sun Ziteng
Publication venue
Publication date: 28/05/2019
Field of study

We consider the problems of distribution estimation and heavy hitter (frequency) estimation under privacy and communication constraints. While these constraints have been studied separately, optimal schemes for one are sub-optimal for the other. We propose a sample-optimal

\varepsilon

-locally differentially private (LDP) scheme for distribution estimation, where each user communicates only one bit, and requires no public randomness. We show that Hadamard Response, a recently proposed scheme for

\varepsilon

-LDP distribution estimation is also utility-optimal for heavy hitter estimation. Finally, we show that unlike distribution estimation, without public randomness where only one bit suffices, any heavy hitter estimation algorithm that communicates

o(\min \{\log n, \log k\})

bits from each user cannot be optimal.Comment: ICML 201

arXiv.org e-Print Archive

Locally Differentially Private Naive Bayes Classification

Author: Al-Rubaie Mohammad
Chang J. Morris
Yilmaz Emre
Publication venue
Publication date: 03/05/2019
Field of study

In machine learning, classification models need to be trained in order to predict class labels. When the training data contains personal information about individuals, collecting training data becomes difficult due to privacy concerns. Local differential privacy is a definition to measure the individual privacy when there is no trusted data curator. Individuals interact with an untrusted data aggregator who obtains statistical information about the population without learning personal data. In order to train a Naive Bayes classifier in an untrusted setting, we propose to use methods satisfying local differential privacy. Individuals send their perturbed inputs that keep the relationship between the feature values and class labels. The data aggregator estimates all probabilities needed by the Naive Bayes classifier. Then, new instances can be classified based on the estimated probabilities. We propose solutions for both discrete and continuous data. In order to eliminate high amount of noise and decrease communication cost in multi-dimensional data, we propose utilizing dimensionality reduction techniques which can be applied by individuals before perturbing their inputs. Our experimental results show that the accuracy of the Naive Bayes classifier is maintained even when the individual privacy is guaranteed under local differential privacy, and that using dimensionality reduction enhances the accuracy

arXiv.org e-Print Archive

Lower Bounds for Locally Private Estimation via Communication Complexity

Author: Duchi John
Rogers Ryan
Publication venue
Publication date: 05/05/2019
Field of study

We develop lower bounds for estimation under local privacy constraints---including differential privacy and its relaxations to approximate or R\'{e}nyi differential privacy---by showing an equivalence between private estimation and communication-restricted estimation problems. Our results apply to arbitrarily interactive privacy mechanisms, and they also give sharp lower bounds for all levels of differential privacy protections, that is, privacy mechanisms with privacy levels

\varepsilon \in [0, \infty)

. As a particular consequence of our results, we show that the minimax mean-squared error for estimating the mean of a bounded or Gaussian random vector in

d

dimensions scales as

\frac{d}{n} \cdot \frac{d}{ \min\{\varepsilon, \varepsilon^2\}}

.Comment: To appear in Conference on Learning Theory 201

arXiv.org e-Print Archive

Successive Refinement of Privacy

Author: Chaudhuri Kamalika
Data Deepesh
Diggavi Suhas
Fragouli Christina
Girgis Antonious M.
Publication venue
Publication date: 24/05/2020
Field of study

This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)? A motivating scenario is providing {\em multiple levels of privacy} to multiple analysts, either for distribution or for heavy-hitter estimation, using the \emph{same} (randomized) output. We call this setting \emph{successive refinement of privacy}, as it provides hierarchical access to the raw data with different privacy levels. For example, the same randomized output could enable one analyst to reconstruct the input, while another can only estimate the distribution subject to LDP requirements. This extends the classical Shannon (wiretap) security setting to local differential privacy. We provide (order-wise) tight characterizations of privacy-utility-randomness trade-offs in several cases for distribution estimation, including the standard LDP setting under a randomness constraint. We also provide a non-trivial privacy mechanism for multi-level privacy. Furthermore, we show that we cannot reuse random keys over time while preserving privacy of each user

arXiv.org e-Print Archive

Protection Against Reconstruction and Its Applications in Private Federated Learning

Author: Bhowmick Abhishek
Duchi John
Freudiger Julien
Kapoor Gaurav
Rogers Ryan
Publication venue
Publication date: 03/06/2019
Field of study

In large-scale statistical learning, data collection and model fitting are moving increasingly toward peripheral devices---phones, watches, fitness trackers---away from centralized data collection. Concomitant with this rise in decentralized data are increasing challenges of maintaining privacy while allowing enough information to fit accurate, useful statistical models. This motivates local notions of privacy---most significantly, local differential privacy, which provides strong protections against sensitive data disclosures---where data is obfuscated before a statistician or learner can even observe it, providing strong protections to individuals' data. Yet local privacy as traditionally employed may prove too stringent for practical use, especially in modern high-dimensional statistical and machine learning problems. Consequently, we revisit the types of disclosures and adversaries against which we provide protections, considering adversaries with limited prior information and ensuring that with high probability, ensuring they cannot reconstruct an individual's data within useful tolerances. By reconceptualizing these protections, we allow more useful data release---large privacy parameters in local differential privacy---and we design new (minimax) optimal locally differentially private mechanisms for statistical learning problems for \emph{all} privacy levels. We thus present practicable approaches to large-scale locally private model training that were previously impossible, showing theoretically and empirically that we can fit large-scale image classification and language models with little degradation in utility

arXiv.org e-Print Archive

Differentially Private Testing of Identity and Closeness of Discrete Distributions

Author: Acharya Jayadev
Sun Ziteng
Zhang Huanyu
Publication venue
Publication date: 31/10/2017
Field of study

We study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over

k

elements, under differential privacy. While the problems have a long history in statistics, finite sample bounds for these problems have only been established recently. In this work, we derive upper and lower bounds on the sample complexity of both the problems under

(\varepsilon, \delta)

-differential privacy. We provide optimal sample complexity algorithms for identity testing problem for all parameter ranges, and the first results for closeness testing. Our closeness testing bounds are optimal in the sparse regime where the number of samples is at most

k

. Our upper bounds are obtained by privatizing non-private estimators for these problems. The non-private estimators are chosen to have small sensitivity. We propose a general framework to establish lower bounds on the sample complexity of statistical tasks under differential privacy. We show a bound on differentially private algorithms in terms of a coupling between the two hypothesis classes we aim to test. By constructing carefully chosen priors over the hypothesis classes, and using Le Cam's two point theorem we provide a general mechanism for proving lower bounds. We believe that the framework can be used to obtain strong lower bounds for other statistical tasks under privacy

arXiv.org e-Print Archive

Minimax Optimal Procedures for Locally Private Estimation

Author: Duchi John
Jordan Michael
Wainwright Martin
Publication venue
Publication date: 14/11/2017
Field of study

Working under a model of privacy in which data remains private even from the statistician, we study the tradeoff between privacy guarantees and the risk of the resulting statistical estimators. We develop private versions of classical information-theoretic bounds, in particular those due to Le Cam, Fano, and Assouad. These inequalities allow for a precise characterization of statistical rates under local privacy constraints and the development of provably (minimax) optimal estimation procedures. We provide a treatment of several canonical families of problems: mean estimation and median estimation, generalized linear models, and nonparametric density estimation. For all of these families, we provide lower and upper bounds that match up to constant factors, and exhibit new (optimal) privacy-preserving mechanisms and computationally efficient estimators that achieve the bounds. Additionally, we present a variety of experimental results for estimation problems involving sensitive data, including salaries, censored blog posts and articles, and drug abuse; these experiments demonstrate the importance of deriving optimal procedures.Comment: 64 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:1302.320

arXiv.org e-Print Archive

Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication

Author: Acharya Jayadev
Sun Ziteng
Zhang Huanyu
Publication venue
Publication date: 27/06/2018
Field of study

We study the problem of estimating

k

-ary distributions under

\varepsilon

-local differential privacy.

n

samples are distributed across users who send privatized versions of their sample to a central server. All previously known sample optimal algorithms require linear (in

k

) communication from each user in the high privacy regime

(\varepsilon=O(1))

, and run in time that grows as

n\cdot k

, which can be prohibitive for large domain size

k

. We propose Hadamard Response (HR}, a local privatization scheme that requires no shared randomness and is symmetric with respect to the users. Our scheme has order optimal sample complexity for all

\varepsilon

, a communication of at most

\log k+2

bits per user, and nearly linear running time of

\tilde{O}(n + k)

. Our encoding and decoding are based on Hadamard matrices, and are simple to implement. The statistical performance relies on the coding theoretic aspects of Hadamard matrices, ie, the large Hamming distance between the rows. An efficient implementation of the algorithm using the Fast Walsh-Hadamard transform gives the computational gains. We compare our approach with Randomized Response (RR), RAPPOR, and subset-selection mechanisms (SS), both theoretically, and experimentally. For

k=10000

, our algorithm runs about 100x faster than SS, and RAPPOR

arXiv.org e-Print Archive

Context-Aware Local Differential Privacy

Author: Acharya Jayadev
Bonawitz Keith
Kairouz Peter
Ramage Daniel
Sun Ziteng
Publication venue
Publication date: 27/07/2020
Field of study

Local differential privacy (LDP) is a strong notion of privacy for individual users that often comes at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. This work proposes a context-aware framework of local differential privacy that allows a privacy designer to incorporate the application's context into the privacy definition. For binary data domains, we provide a universally optimal privatization scheme and highlight its connections to Warner's randomized response (RR) and Mangat's improved response. Motivated by geolocation and web search applications, for

k

-ary data domains, we consider two special cases of context-aware LDP: block-structured LDP and high-low LDP. We study discrete distribution estimation and provide communication-efficient, sample-optimal schemes and information-theoretic lower bounds for both models. We show that using contextual information can require fewer samples than classical LDP to achieve the same accuracy

arXiv.org e-Print Archive