Search CORE

30,552 research outputs found

Differentially Private Release and Learning of Threshold Functions

Author: Bun Mark
Nissim Kobbi
Stemmer Uri
Vadhan Salil
Publication venue
Publication date: 28/04/2015
Field of study

We prove new upper and lower bounds on the sample complexity of

(\epsilon, \delta)

differentially private algorithms for releasing approximate answers to threshold functions. A threshold function

c_x

over a totally ordered domain

X

evaluates to

c_x(y) = 1

y \le x

, and evaluates to

0

otherwise. We give the first nontrivial lower bound for releasing thresholds with

(\epsilon,\delta)

differential privacy, showing that the task is impossible over an infinite domain

X

, and moreover requires sample complexity

n \ge \Omega(\log^*|X|)

, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with

n \le 2^{(1+ o(1))\log^*|X|}

samples. This improves the previous best upper bound of

8^{(1 + o(1))\log^*|X|}

(Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with

(\epsilon,\delta)

differential privacy and learning without privacy. For properly learning thresholds in

\ell

dimensions, this lower bound extends to

n \ge \Omega(\ell \cdot \log^*|X|)

. To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database

D

of elements from

X

, the interior point problem asks for an element between the smallest and largest elements in

D

. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.Comment: 43 page

arXiv.org e-Print Archive

Crossref

Differential Privacy and the Fat-Shattering Dimension of Linear Queries

Author: A. Beimel
A. Blum
C. Dwork
C. Dwork
C. Dwork
C. Dwork
K. Nissim
M.J. Kearns
N. Alon
P.L. Bartlett
P.L. Bartlett
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we consider the task of answering linear queries under the constraint of differential privacy. This is a general and well-studied class of queries that captures other commonly studied classes, including predicate queries and histogram queries. We show that the accuracy to which a set of linear queries can be answered is closely related to its fat-shattering dimension, a property that characterizes the learnability of real-valued functions in the agnostic-learning setting.Comment: Appears in APPROX 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Order-Revealing Encryption and the Hardness of Private Learning

Author: A Beimel
A Beimel
A Blum
A Boldyreva
A Gupta
B Chor
C Dwork
C Dwork
D Boneh
D Boneh
D Boneh
J Groth
J Thaler
J Ullman
L Pitt
LG Valiant
M Kearns
M Kearns
M Kharitonov
O Goldreich
O Pandey
RA Servedio
RA Servedio
S Garg
S Goldwasser
SP Kasiviswanathan
T Graepel
Z Brakerski
Publication venue
Publication date: 01/01/2015
Field of study

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient

(\epsilon, \delta)

-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.Comment: 28 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity

Author: Feldman Vitaly
Xiao David
Publication venue
Publication date: 13/09/2015
Field of study

In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and well-studied notion of privacy introduced by Dwork et al. (2006) that ensures that the output of an algorithm leaks little information about the data point provided by any of the participating individuals. Sample complexity of private PAC and agnostic learning was studied in a number of prior works starting with (Kasiviswanathan et al., 2008) but a number of basic questions still remain open, most notably whether learning with privacy requires more samples than learning without privacy. We show that the sample complexity of learning with (pure) differential privacy can be arbitrarily higher than the sample complexity of learning without the privacy constraint or the sample complexity of learning with approximate differential privacy. Our second contribution and the main tool is an equivalence between the sample complexity of (pure) differentially private learning of a concept class

C

(or

SCDP(C)

) and the randomized one-way communication complexity of the evaluation problem for concepts from

C

. Using this equivalence we prove the following bounds: 1.

SCDP(C) = \Omega(LDim(C))

, where

LDim(C)

is the Littlestone's (1987) dimension characterizing the number of mistakes in the online-mistake-bound learning model. Known bounds on

LDim(C)

then imply that

SCDP(C)

can be much higher than the VC-dimension of

C

. 2. For any

t

, there exists a class

C

such that

LDim(C)=2

but

SCDP(C) \geq t

. 3. For any

t

, there exists a class

C

such that the sample complexity of (pure)

\alpha

-differentially private PAC learning is

\Omega(t/\alpha)

but the sample complexity of the relaxed

(\alpha,\beta)

-differentially private PAC learning is

O(\log(1/\beta)/\alpha)

. This resolves an open problem of Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

CiteSeerX

Characterizing the Sample Complexity of Private Learners

Author: Beimel Amos
Nissim Kobbi
Stemmer Uri
Publication venue
Publication date: 01/01/2013
Field of study

In 2008, Kasiviswanathan et al. defined private learning as a combination of PAC learning and differential privacy. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C)=O(m), and that there exists a private learning algorithm with sample complexity m=O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release

arXiv.org e-Print Archive

Crossref