2,308 research outputs found
Privacy-Preserving Distributed Optimization via Subspace Perturbation: A General Framework
As the modern world becomes increasingly digitized and interconnected,
distributed signal processing has proven to be effective in processing its
large volume of data. However, a main challenge limiting the broad use of
distributed signal processing techniques is the issue of privacy in handling
sensitive data. To address this privacy issue, we propose a novel yet general
subspace perturbation method for privacy-preserving distributed optimization,
which allows each node to obtain the desired solution while protecting its
private data. In particular, we show that the dual variables introduced in each
distributed optimizer will not converge in a certain subspace determined by the
graph topology. Additionally, the optimization variable is ensured to converge
to the desired solution, because it is orthogonal to this non-convergent
subspace. We therefore propose to insert noise in the non-convergent subspace
through the dual variable such that the private data are protected, and the
accuracy of the desired solution is completely unaffected. Moreover, the
proposed method is shown to be secure under two widely-used adversary models:
passive and eavesdropping. Furthermore, we consider several distributed
optimizers such as ADMM and PDMM to demonstrate the general applicability of
the proposed method. Finally, we test the performance through a set of
applications. Numerical tests indicate that the proposed method is superior to
existing methods in terms of several parameters like estimated accuracy,
privacy level, communication cost and convergence rate
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
Brave: Byzantine-Resilient and Privacy-Preserving Peer-to-Peer Federated Learning
Federated learning (FL) enables multiple participants to train a global
machine learning model without sharing their private training data.
Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating
the server that aggregates local models from participants and then updates the
global model. However, P2P FL is vulnerable to (i) honest-but-curious
participants whose objective is to infer private training data of other
participants, and (ii) Byzantine participants who can transmit arbitrarily
manipulated local models to corrupt the learning process. P2P FL schemes that
simultaneously guarantee Byzantine resilience and preserve privacy have been
less studied. In this paper, we develop Brave, a protocol that ensures
Byzantine Resilience And privacy-preserving property for P2P FL in the presence
of both types of adversaries. We show that Brave preserves privacy by
establishing that any honest-but-curious adversary cannot infer other
participants' private data by observing their models. We further prove that
Brave is Byzantine-resilient, which guarantees that all benign participants
converge to an identical model that deviates from a global model trained
without Byzantine adversaries by a bounded distance. We evaluate Brave against
three state-of-the-art adversaries on a P2P FL for image classification tasks
on benchmark datasets CIFAR10 and MNIST. Our results show that the global model
learned with Brave in the presence of adversaries achieves comparable
classification accuracy to a global model trained in the absence of any
adversary
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
Privacy Amplification via Shuffling: Unified, Simplified, and Tightened
In decentralized settings, the shuffle model of differential privacy has
emerged as a promising alternative to the classical local model. Analyzing
privacy amplification via shuffling is a critical component in both
single-message and multi-message shuffle protocols. However, current methods
used in these two areas are distinct and specific, making them less convenient
for protocol designers and practitioners. In this work, we introduce
variation-ratio reduction as a unified framework for privacy amplification
analyses in the shuffle model. This framework utilizes total variation bounds
of local messages and probability ratio bounds of other users' blanket
messages, converting them to indistinguishable levels. Our results indicate
that the framework yields tighter bounds for both single-message and
multi-message encoders (e.g., with local DP, local metric DP, or general
multi-message randomizers). Specifically, for a broad range of local
randomizers having extremal probability design, our amplification bounds are
precisely tight. We also demonstrate that variation-ratio reduction is
well-suited for parallel composition in the shuffle model and results in
stricter privacy accounting for common sampling-based local randomizers. Our
experimental findings show that, compared to existing amplification bounds, our
numerical amplification bounds can save up to of the budget for
single-message protocols, of the budget for multi-message protocols, and
- of the budget for parallel composition. Additionally, our
implementation for numerical amplification bounds has only
complexity and is highly efficient in practice, taking just minutes for
users. The code for our implementation can be found at
\url{https://github.com/wangsw/PrivacyAmplification}.Comment: Code available at https://github.com/wangsw/PrivacyAmplificatio
Characterizing the Sample Complexity of Private Learners
In 2008, Kasiviswanathan et al. defined private learning as a combination of
PAC learning and differential privacy. Informally, a private learner is applied
to a collection of labeled individual information and outputs a hypothesis
while preserving the privacy of each individual. Kasiviswanathan et al. gave a
generic construction of private learners for (finite) concept classes, with
sample complexity logarithmic in the size of the concept class. This sample
complexity is higher than what is needed for non-private learners, hence
leaving open the possibility that the sample complexity of private learning may
be sometimes significantly higher than that of non-private learning.
We give a combinatorial characterization of the sample size sufficient and
necessary to privately learn a class of concepts. This characterization is
analogous to the well known characterization of the sample complexity of
non-private learning in terms of the VC dimension of the concept class. We
introduce the notion of probabilistic representation of a concept class, and
our new complexity measure RepDim corresponds to the size of the smallest
probabilistic representation of the concept class.
We show that any private learning algorithm for a concept class C with sample
complexity m implies RepDim(C)=O(m), and that there exists a private learning
algorithm with sample complexity m=O(RepDim(C)). We further demonstrate that a
similar characterization holds for the database size needed for privately
computing a large class of optimization problems and also for the well studied
problem of private data release
RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Active Data Manipulation
Federated learning (FL) has recently emerged as a privacy-preserving approach
for machine learning in domains that rely on user interactions, particularly
recommender systems (RS) and online learning to rank (OLTR). While there has
been substantial research on the privacy of traditional FL, little attention
has been paid to studying the privacy properties of these interaction-based FL
(IFL) systems. In this work, we show that IFL can introduce unique challenges
concerning user privacy, particularly when the central server has knowledge and
control over the items that users interact with. Specifically, we demonstrate
the threat of reconstructing user interactions by presenting RAIFLE, a general
optimization-based reconstruction attack framework customized for IFL. RAIFLE
employs Active Data Manipulation (ADM), a novel attack technique unique to IFL,
where the server actively manipulates the training features of the items to
induce adversarial behaviors in the local FL updates. We show that RAIFLE is
more impactful than existing FL privacy attacks in the IFL context, and
describe how it can undermine privacy defenses like secure aggregation and
private information retrieval. Based on our findings, we propose and discuss
countermeasure guidelines to mitigate our attack in the context of federated
RS/OLTR specifically and IFL more broadly
- …