4,540 research outputs found
Sample Complexity Bounds on Differentially Private Learning via Communication Complexity
In this work we analyze the sample complexity of classification by
differentially private algorithms. Differential privacy is a strong and
well-studied notion of privacy introduced by Dwork et al. (2006) that ensures
that the output of an algorithm leaks little information about the data point
provided by any of the participating individuals. Sample complexity of private
PAC and agnostic learning was studied in a number of prior works starting with
(Kasiviswanathan et al., 2008) but a number of basic questions still remain
open, most notably whether learning with privacy requires more samples than
learning without privacy.
We show that the sample complexity of learning with (pure) differential
privacy can be arbitrarily higher than the sample complexity of learning
without the privacy constraint or the sample complexity of learning with
approximate differential privacy. Our second contribution and the main tool is
an equivalence between the sample complexity of (pure) differentially private
learning of a concept class (or ) and the randomized one-way
communication complexity of the evaluation problem for concepts from . Using
this equivalence we prove the following bounds:
1. , where is the Littlestone's (1987)
dimension characterizing the number of mistakes in the online-mistake-bound
learning model. Known bounds on then imply that can be much
higher than the VC-dimension of .
2. For any , there exists a class such that but .
3. For any , there exists a class such that the sample complexity of
(pure) -differentially private PAC learning is but
the sample complexity of the relaxed -differentially private
PAC learning is . This resolves an open problem of
Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT)
201
The Role of Interactivity in Local Differential Privacy
We study the power of interactivity in local differential privacy. First, we
focus on the difference between fully interactive and sequentially interactive
protocols. Sequentially interactive protocols may query users adaptively in
sequence, but they cannot return to previously queried users. The vast majority
of existing lower bounds for local differential privacy apply only to
sequentially interactive protocols, and before this paper it was not known
whether fully interactive protocols were more powerful. We resolve this
question. First, we classify locally private protocols by their
compositionality, the multiplicative factor by which the sum of a
protocol's single-round privacy parameters exceeds its overall privacy
guarantee. We then show how to efficiently transform any fully interactive
-compositional protocol into an equivalent sequentially interactive protocol
with an blowup in sample complexity. Next, we show that our reduction is
tight by exhibiting a family of problems such that for any , there is a
fully interactive -compositional protocol which solves the problem, while no
sequentially interactive protocol can solve the problem without at least an
factor more examples. We then turn our attention to
hypothesis testing problems. We show that for a large class of compound
hypothesis testing problems --- which include all simple hypothesis testing
problems as a special case --- a simple noninteractive test is optimal among
the class of all (possibly fully interactive) tests
Differentially Private Release and Learning of Threshold Functions
We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function over a totally ordered domain
evaluates to if , and evaluates to otherwise. We
give the first nontrivial lower bound for releasing thresholds with
differential privacy, showing that the task is impossible
over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the
techniques used to prove this lower bound, we give an algorithm for releasing
thresholds with samples. This improves the
previous best upper bound of (Beimel et al., RANDOM
'13).
Our sample complexity upper and lower bounds also apply to the tasks of
learning distributions with respect to Kolmogorov distance and of properly PAC
learning thresholds with differential privacy. The lower bound gives the first
separation between the sample complexity of properly learning a concept class
with differential privacy and learning without privacy. For
properly learning thresholds in dimensions, this lower bound extends to
.
To obtain our results, we give reductions in both directions from releasing
and properly learning thresholds and the simpler interior point problem. Given
a database of elements from , the interior point problem asks for an
element between the smallest and largest elements in . We introduce new
recursive constructions for bounding the sample complexity of the interior
point problem, as well as further reductions and techniques for proving
impossibility results for other basic problems in differential privacy.Comment: 43 page
What Can We Learn Privately?
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life data
sets. We ask: what concept classes can be learned privately, namely, by an
algorithm whose output does not depend too heavily on any one input or specific
training example? More precisely, we investigate learning algorithms that
satisfy differential privacy, a notion that provides strong confidentiality
guarantees in contexts where aggregate information is released about a database
containing sensitive information about individuals. We demonstrate that,
ignoring computational constraints, it is possible to privately agnostically
learn any concept class using a sample size approximately logarithmic in the
cardinality of the concept class. Therefore, almost anything learnable is
learnable privately: specifically, if a concept class is learnable by a
(non-private) algorithm with polynomial sample complexity and output size, then
it can be learned privately using a polynomial number of samples. We also
present a computationally efficient private PAC learner for the class of parity
functions. Local (or randomized response) algorithms are a practical class of
private algorithms that have received extensive investigation. We provide a
precise characterization of local private learning algorithms. We show that a
concept class is learnable by a local algorithm if and only if it is learnable
in the statistical query (SQ) model. Finally, we present a separation between
the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure
- …