5,025 research outputs found
Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning
Federated learning is a distributed framework for training machine learning
models over the data residing at mobile devices, while protecting the privacy
of individual users. A major bottleneck in scaling federated learning to a
large number of users is the overhead of secure model aggregation across many
users. In particular, the overhead of the state-of-the-art protocols for secure
model aggregation grows quadratically with the number of users. In this paper,
we propose the first secure aggregation framework, named Turbo-Aggregate, that
in a network with users achieves a secure aggregation overhead of
, as opposed to , while tolerating up to a user dropout
rate of . Turbo-Aggregate employs a multi-group circular strategy for
efficient model aggregation, and leverages additive secret sharing and novel
coding techniques for injecting aggregation redundancy in order to handle user
dropouts while guaranteeing user privacy. We experimentally demonstrate that
Turbo-Aggregate achieves a total running time that grows almost linear in the
number of users, and provides up to speedup over the
state-of-the-art protocols with up to users. Our experiments also
demonstrate the impact of model size and bandwidth on the performance of
Turbo-Aggregate
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
The Internet of Things (IoT) will be a main data generation infrastructure
for achieving better system intelligence. This paper considers the design and
implementation of a practical privacy-preserving collaborative learning scheme,
in which a curious learning coordinator trains a better machine learning model
based on the data samples contributed by a number of IoT objects, while the
confidentiality of the raw forms of the training data is protected against the
coordinator. Existing distributed machine learning and data encryption
approaches incur significant computation and communication overhead, rendering
them ill-suited for resource-constrained IoT objects. We study an approach that
applies independent Gaussian random projection at each IoT object to obfuscate
data and trains a deep neural network at the coordinator based on the projected
data from the IoT objects. This approach introduces light computation overhead
to the IoT objects and moves most workload to the coordinator that can have
sufficient computing resources. Although the independent projections performed
by the IoT objects address the potential collusion between the curious
coordinator and some compromised IoT objects, they significantly increase the
complexity of the projected data. In this paper, we leverage the superior
learning capability of deep learning in capturing sophisticated patterns to
maintain good learning performance. Extensive comparative evaluation shows that
this approach outperforms other lightweight approaches that apply additive
noisification for differential privacy and/or support vector machines for
learning in the applications with light data pattern complexities.Comment: 12 pages,IOTDI 201
Survey: Leakage and Privacy at Inference Time
Leakage of data from publicly available Machine Learning (ML) models is an
area of growing significance as commercial and government applications of ML
can draw on multiple sources of data, potentially including users' and clients'
sensitive data. We provide a comprehensive survey of contemporary advances on
several fronts, covering involuntary data leakage which is natural to ML
models, potential malevolent leakage which is caused by privacy attacks, and
currently available defence mechanisms. We focus on inference-time leakage, as
the most likely scenario for publicly available models. We first discuss what
leakage is in the context of different data, tasks, and model architectures. We
then propose a taxonomy across involuntary and malevolent leakage, available
defences, followed by the currently available assessment metrics and
applications. We conclude with outstanding challenges and open questions,
outlining some promising directions for future research
Federated Learning Attacks and Defenses: A Survey
In terms of artificial intelligence, there are several security and privacy
deficiencies in the traditional centralized training methods of machine
learning models by a server. To address this limitation, federated learning
(FL) has been proposed and is known for breaking down ``data silos" and
protecting the privacy of users. However, FL has not yet gained popularity in
the industry, mainly due to its security, privacy, and high cost of
communication. For the purpose of advancing the research in this field,
building a robust FL system, and realizing the wide application of FL, this
paper sorts out the possible attacks and corresponding defenses of the current
FL system systematically. Firstly, this paper briefly introduces the basic
workflow of FL and related knowledge of attacks and defenses. It reviews a
great deal of research about privacy theft and malicious attacks that have been
studied in recent years. Most importantly, in view of the current three
classification criteria, namely the three stages of machine learning, the three
different roles in federated learning, and the CIA (Confidentiality, Integrity,
and Availability) guidelines on privacy protection, we divide attack approaches
into two categories according to the training stage and the prediction stage in
machine learning. Furthermore, we also identify the CIA property violated for
each attack method and potential attack role. Various defense mechanisms are
then analyzed separately from the level of privacy and security. Finally, we
summarize the possible challenges in the application of FL from the aspect of
attacks and defenses and discuss the future development direction of FL
systems. In this way, the designed FL system has the ability to resist
different attacks and is more secure and stable.Comment: IEEE BigData. 10 pages, 2 figures, 2 table
PA-iMFL: Communication-Efficient Privacy Amplification Method against Data Reconstruction Attack in Improved Multi-Layer Federated Learning
Recently, big data has seen explosive growth in the Internet of Things (IoT).
Multi-layer FL (MFL) based on cloud-edge-end architecture can promote model
training efficiency and model accuracy while preserving IoT data privacy. This
paper considers an improved MFL, where edge layer devices own private data and
can join the training process. iMFL can improve edge resource utilization and
also alleviate the strict requirement of end devices, but suffers from the
issues of Data Reconstruction Attack (DRA) and unacceptable communication
overhead. This paper aims to address these issues with iMFL. We propose a
Privacy Amplification scheme on iMFL (PA-iMFL). Differing from standard MFL, we
design privacy operations in end and edge devices after local training,
including three sequential components, local differential privacy with Laplace
mechanism, privacy amplification subsample, and gradient sign reset.
Benefitting from privacy operations, PA-iMFL reduces communication overhead and
achieves privacy-preserving. Extensive results demonstrate that against
State-Of-The-Art (SOTA) DRAs, PA-iMFL can effectively mitigate private data
leakage and reach the same level of protection capability as the SOTA defense
model. Moreover, due to adopting privacy operations in edge devices, PA-iMFL
promotes up to 2.8 times communication efficiency than the SOTA compression
method without compromising model accuracy.Comment: 12 pages, 11 figure
Privacy-preserving machine learning system at the edge
Data privacy in machine learning has become an urgent problem to be solved, along with machine learning's rapid development and the large attack surface being explored.
Pre-trained deep neural networks are increasingly deployed in smartphones and other edge devices for a variety of applications, leading to potential disclosures of private information.
In collaborative learning, participants keep private data locally and communicate deep neural networks updated on their local data, but still, the private information encoded in the networks' gradients can be explored by adversaries.
This dissertation aims to perform dedicated investigations on privacy leakage from neural networks and to propose privacy-preserving machine learning systems for edge devices.
Firstly, the systematization of knowledge is conducted to identify the key challenges and existing/adaptable solutions.
Then a framework is proposed to measure the amount of sensitive information memorized in each layer's weights of a neural network based on the generalization error. Results show that, when considered individually, the last layers encode a larger amount of information from the training data compared to the first layers.
To protect such sensitive information in weights, DarkneTZ is proposed as a framework that uses an edge device's Trusted Execution Environment (TEE) in conjunction with model partitioning to limit the attack surface against neural networks.
The performance of DarkneTZ is evaluated, including CPU execution time, memory usage, and accurate power consumption, using two small and six large image classification models. Due to the limited memory of the edge device's TEE, model layers are partitioned into more sensitive layers (to be executed inside the device TEE), and a set of layers to be executed in the untrusted part of the operating system. Results show that even if a single layer is hidden, one can provide reliable model privacy and defend against state of art membership inference attacks, with only a 3% performance overhead.
This thesis further strengthens investigations from neural network weights (in on-device machine learning deployment) to gradients (in collaborative learning).
An information-theoretical framework is proposed, by adapting usable information theory and considering the attack outcome as a probability measure, to quantify private information leakage from network gradients. The private original information and latent information are localized in a layer-wise manner.
After that, this work performs sensitivity analysis over the gradients \wrt~private information to further explore the underlying cause of information leakage.
Numerical evaluations are conducted on six benchmark datasets and four well-known networks and further measure the impact of training hyper-parameters and defense mechanisms.
Last but not least, to limit the privacy leakages in gradients, I propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems. TEEs are utilized on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries.
This work leverages greedy layer-wise training to train each model's layer inside the trusted area until its convergence.
The performance evaluation of the implementation shows that PPFL significantly improves privacy by defending against data reconstruction, property inference, and membership inference attacks while incurring small communication overhead and client-side system overheads.
This thesis offers a better understanding of the sources of private information in machine learning and provides frameworks to fully guarantee privacy and achieve comparable ML model utility and system overhead with regular machine learning framework.Open Acces
- …