33 research outputs found
Exploring Private Federated Learning with Laplacian Smoothing
Federated learning aims to protect data privacy by collaboratively learning a
model without sharing private data among users. However, an adversary may still
be able to infer the private training data by attacking the released model.
Differential privacy(DP) provides a statistical guarantee against such attacks,
at a privacy of possibly degenerating the accuracy or utility of the trained
models. In this paper, we apply a utility enhancement scheme based on Laplacian
smoothing for differentially-private federated learning (DP-Fed-LS), where the
parameter aggregation with injected Gaussian noise is improved in statistical
precision. We provide tight closed-form privacy bounds for both uniform and
Poisson subsampling and derive corresponding DP guarantees for differential
private federated learning, with or without Laplacian smoothing. Experiments
over MNIST, SVHN and Shakespeare datasets show that the proposed method can
improve model accuracy with DP-guarantee under both subsampling mechanisms
Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning
Federated learning, i.e., a mobile edge computing framework for deep
learning, is a recent advance in privacy-preserving machine learning, where the
model is trained in a decentralized manner by the clients, i.e., data curators,
preventing the server from directly accessing those private data from the
clients. This learning mechanism significantly challenges the attack from the
server side. Although the state-of-the-art attacking techniques that
incorporated the advance of Generative adversarial networks (GANs) could
construct class representatives of the global data distribution among all
clients, it is still challenging to distinguishably attack a specific client
(i.e., user-level privacy leakage), which is a stronger privacy threat to
precisely recover the private data from a specific client. This paper gives the
first attempt to explore user-level privacy leakage against the federated
learning by the attack from a malicious server. We propose a framework
incorporating GAN with a multi-task discriminator, which simultaneously
discriminates category, reality, and client identity of input samples. The
novel discrimination on client identity enables the generator to recover user
specified private data. Unlike existing works that tend to interfere the
training process of the federated learning, the proposed method works
"invisibly" on the server side. The experimental results demonstrate the
effectiveness of the proposed attacking approach and the superior to the
state-of-the-art.Comment: The 38th Annual IEEE International Conference on Computer
Communications (INFOCOM 2019
Contamination Attacks and Mitigation in Multi-Party Machine Learning
Machine learning is data hungry; the more data a model has access to in
training, the more likely it is to perform well at inference time. Distinct
parties may want to combine their local data to gain the benefits of a model
trained on a large corpus of data. We consider such a case: parties get access
to the model trained on their joint data but do not see each others individual
datasets. We show that one needs to be careful when using this multi-party
model since a potentially malicious party can taint the model by providing
contaminated data. We then show how adversarial training can defend against
such attacks by preventing the model from learning trends specific to
individual parties data, thereby also guaranteeing party-level membership
privacy
Achieving Secure and Differentially Private Computations in Multiparty Settings
Sharing and working on sensitive data in distributed settings from healthcare
to finance is a major challenge due to security and privacy concerns. Secure
multiparty computation (SMC) is a viable panacea for this, allowing distributed
parties to make computations while the parties learn nothing about their data,
but the final result. Although SMC is instrumental in such distributed
settings, it does not provide any guarantees not to leak any information about
individuals to adversaries. Differential privacy (DP) can be utilized to
address this; however, achieving SMC with DP is not a trivial task, either. In
this paper, we propose a novel Secure Multiparty Distributed Differentially
Private (SM-DDP) protocol to achieve secure and private computations in a
multiparty environment. Specifically, with our protocol, we simultaneously
achieve SMC and DP in distributed settings focusing on linear regression on
horizontally distributed data. That is, parties do not see each others' data
and further, can not infer information about individuals from the final
constructed statistical model. Any statistical model function that allows
independent calculation of local statistics can be computed through our
protocol. The protocol implements homomorphic encryption for SMC and functional
mechanism for DP to achieve the desired security and privacy guarantees. In
this work, we first introduce the theoretical foundation for the SM-DDP
protocol and then evaluate its efficacy and performance on two different
datasets. Our results show that one can achieve individual-level privacy
through the proposed protocol with distributed DP, which is independently
applied by each party in a distributed fashion. Moreover, our results also show
that the SM-DDP protocol incurs minimal computational overhead, is scalable,
and provides security and privacy guarantees
Private Deep Learning with Teacher Ensembles
Privacy-preserving deep learning is crucial for deploying deep neural network
based solutions, especially when the model works on data that contains
sensitive information. Most privacy-preserving methods lead to undesirable
performance degradation. Ensemble learning is an effective way to improve model
performance. In this work, we propose a new method for teacher ensembles that
uses more informative network outputs under differential private stochastic
gradient descent and provide provable privacy guarantees. Out method employs
knowledge distillation and hint learning on intermediate representations to
facilitate the training of student model. Additionally, we propose a simple
weighted ensemble scheme that works more robustly across different teaching
settings. Experimental results on three common image datasets benchmark (i.e.,
CIFAR10, MINST, and SVHN) demonstrate that our approach outperforms previous
state-of-the-art methods on both performance and privacy-budget.Comment: fixed updated version will be updated late
CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning
How to train a machine learning model while keeping the data private and
secure? We present CodedPrivateML, a fast and scalable approach to this
critical problem. CodedPrivateML keeps both the data and the model
information-theoretically private, while allowing efficient parallelization of
training across distributed workers. We characterize CodedPrivateML's privacy
threshold and prove its convergence for logistic (and linear) regression.
Furthermore, via experiments over Amazon EC2, we demonstrate that
CodedPrivateML can provide an order of magnitude speedup (up to ) over the state-of-the-art cryptographic approaches.Comment: 14 pages, 5 figure
Federated Learning for Healthcare Informatics
With the rapid development of computer software and hardware technologies,
more and more healthcare data are becoming readily available from clinical
institutions, patients, insurance companies and pharmaceutical industries,
among others. This access provides an unprecedented opportunity for data
science technologies to derive data-driven insights and improve the quality of
care delivery. Healthcare data, however, are usually fragmented and private
making it difficult to generate robust results across populations. For example,
different hospitals own the electronic health records (EHR) of different
patient populations and these records are difficult to share across hospitals
because of their sensitive nature. This creates a big barrier for developing
effective analytical approaches that are generalizable, which need diverse,
"big data". Federated learning, a mechanism of training a shared global model
with a central server while keeping all the sensitive data in local
institutions where the data belong, provides great promise to connect the
fragmented healthcare data sources with privacy-preservation. The goal of this
survey is to provide a review for federated learning technologies, particularly
within the biomedical space. In particular, we summarize the general solutions
to the statistical challenges, system challenges and privacy issues in
federated learning, and point out the implications and potentials in
healthcare.Comment: 18 page
SecureGBM: Secure Multi-Party Gradient Boosting
Federated machine learning systems have been widely used to facilitate the
joint data analytics across the distributed datasets owned by the different
parties that do not trust each others. In this paper, we proposed a novel
Gradient Boosting Machines (GBM) framework SecureGBM built-up with a
multi-party computation model based on semi-homomorphic encryption, where every
involved party can jointly obtain a shared Gradient Boosting machines model
while protecting their own data from the potential privacy leakage and
inferential identification. More specific, our work focused on a specific
"dual--party" secure learning scenario based on two parties -- both party own
an unique view (i.e., attributes or features) to the sample group of samples
while only one party owns the labels. In such scenario, feature and label data
are not allowed to share with others. To achieve the above goal, we firstly
extent -- LightGBM -- a well known implementation of tree-based GBM through
covering its key operations for training and inference with SEAL homomorphic
encryption schemes. However, the performance of such re-implementation is
significantly bottle-necked by the explosive inflation of the communication
payloads, based on ciphertexts subject to the increasing length of plaintexts.
In this way, we then proposed to use stochastic approximation techniques to
reduced the communication payloads while accelerating the overall training
procedure in a statistical manner. Our experiments using the real-world data
showed that SecureGBM can well secure the communication and computation of
LightGBM training and inference procedures for the both parties while only
losing less than 3% AUC, using the same number of iterations for gradient
boosting, on a wide range of benchmark datasets.Comment: The first two authors contributed equally to the manuscript. The
paper has been accepted for publication in IEEE BigData 201
Differentially Private Deep Learning with Smooth Sensitivity
Ensuring the privacy of sensitive data used to train modern machine learning
models is of paramount importance in many areas of practice. One approach to
study these concerns is through the lens of differential privacy. In this
framework, privacy guarantees are generally obtained by perturbing models in
such a way that specifics of data used to train the model are made ambiguous. A
particular instance of this approach is through a "teacher-student" framework,
wherein the teacher, who owns the sensitive data, provides the student with
useful, but noisy, information, hopefully allowing the student model to perform
well on a given task without access to particular features of the sensitive
data. Because stronger privacy guarantees generally involve more significant
perturbation on the part of the teacher, deploying existing frameworks
fundamentally involves a trade-off between student's performance and privacy
guarantee. One of the most important techniques used in previous works involves
an ensemble of teacher models, which return information to a student based on a
noisy voting procedure. In this work, we propose a novel voting mechanism with
smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain
conditions, can bear very large random noising from the teacher without
affecting the useful information transferred to the student.
Compared with previous work, our approach improves over the state-of-the-art
methods on all measures, and scale to larger tasks with both better performance
and stronger privacy (). This new proposed framework can be
applied with any machine learning models, and provides an appealing solution
for tasks that requires training on a large amount of data
Stochastic Distributed Optimization for Machine Learning from Decentralized Features
Distributed machine learning has been widely studied in the literature to
scale up machine learning model training in the presence of an ever-increasing
amount of data. We study distributed machine learning from another perspective,
where the information about the training same samples are inherently
decentralized and located on different parities. We propose an asynchronous
stochastic gradient descent (SGD) algorithm for such a feature distributed
machine learning (FDML) problem, to jointly learn from decentralized features,
with theoretical convergence guarantees under bounded asynchrony. Our algorithm
does not require sharing the original feature data or even local model
parameters between parties, thus preserving a high level of data
confidentiality. We implement our algorithm for FDML in a parameter server
architecture. We compare our system with fully centralized training (which
violates data locality requirements) and training only based on local features,
through extensive experiments performed on a large amount of data from a
real-world application, involving 5 million samples and features in
total. Experimental results have demonstrated the effectiveness and efficiency
of the proposed FDML system.Comment: 9 page