15,803 research outputs found
Sample Complexity Bounds on Differentially Private Learning via Communication Complexity
In this work we analyze the sample complexity of classification by
differentially private algorithms. Differential privacy is a strong and
well-studied notion of privacy introduced by Dwork et al. (2006) that ensures
that the output of an algorithm leaks little information about the data point
provided by any of the participating individuals. Sample complexity of private
PAC and agnostic learning was studied in a number of prior works starting with
(Kasiviswanathan et al., 2008) but a number of basic questions still remain
open, most notably whether learning with privacy requires more samples than
learning without privacy.
We show that the sample complexity of learning with (pure) differential
privacy can be arbitrarily higher than the sample complexity of learning
without the privacy constraint or the sample complexity of learning with
approximate differential privacy. Our second contribution and the main tool is
an equivalence between the sample complexity of (pure) differentially private
learning of a concept class (or ) and the randomized one-way
communication complexity of the evaluation problem for concepts from . Using
this equivalence we prove the following bounds:
1. , where is the Littlestone's (1987)
dimension characterizing the number of mistakes in the online-mistake-bound
learning model. Known bounds on then imply that can be much
higher than the VC-dimension of .
2. For any , there exists a class such that but .
3. For any , there exists a class such that the sample complexity of
(pure) -differentially private PAC learning is but
the sample complexity of the relaxed -differentially private
PAC learning is . This resolves an open problem of
Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT)
201
Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs
Distributed online learning has been proven extremely effective in solving
large-scale machine learning problems over streaming data. However, information
sharing between learners in distributed learning also raises concerns about the
potential leakage of individual learners' sensitive data. To mitigate this
risk, differential privacy, which is widely regarded as the "gold standard" for
privacy protection, has been widely employed in many existing results on
distributed online learning. However, these results often face a fundamental
tradeoff between learning accuracy and privacy. In this paper, we propose a
locally differentially private gradient tracking based distributed online
learning algorithm that successfully circumvents this tradeoff. We prove that
the proposed algorithm converges in mean square to the exact optimal solution
while ensuring rigorous local differential privacy, with the cumulative privacy
budget guaranteed to be finite even when the number of iterations tends to
infinity. The algorithm is applicable even when the communication graph among
learners is directed. To the best of our knowledge, this is the first result
that simultaneously ensures learning accuracy and rigorous local differential
privacy in distributed online learning over directed graphs. We evaluate our
algorithm's performance by using multiple benchmark machine-learning
applications, including logistic regression of the "Mushrooms" dataset and
CNN-based image classification of the "MNIST" and "CIFAR-10" datasets,
respectively. The experimental results confirm that the proposed algorithm
outperforms existing counterparts in both training and testing accuracies.Comment: 21 pages, 4 figure
- …