210 research outputs found
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Deep Learning has recently become hugely popular in machine learning,
providing significant improvements in classification accuracy in the presence
of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning.
Models are typically trained in a centralized manner with all the data being
processed by the same training algorithm. If the data is a collection of users'
private data, including habits, personal pictures, geographical positions,
interests, and more, the centralized server will have access to sensitive
information that could potentially be mishandled. To tackle this problem,
collaborative deep learning models have recently been proposed where parties
locally train their deep learning structures and only share a subset of the
parameters in the attempt to keep their respective training sets private.
Parameters can also be obfuscated via differential privacy (DP) to make
information extraction even more challenging, as proposed by Shokri and
Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep
learning is susceptible to a powerful attack that we devise in this paper. In
particular, we show that a distributed, federated, or decentralized deep
learning approach is fundamentally broken and does not protect the training
sets of honest participants. The attack we developed exploits the real-time
nature of the learning process that allows the adversary to train a Generative
Adversarial Network (GAN) that generates prototypical samples of the targeted
training set that was meant to be private (the samples generated by the GAN are
intended to come from the same distribution as the training data).
Interestingly, we show that record-level DP applied to the shared parameters of
the model, as suggested in previous work, is ineffective (i.e., record-level DP
is not designed to address our attack).Comment: ACM CCS'17, 16 pages, 18 figure
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
The Internet of Things (IoT) will be a main data generation infrastructure
for achieving better system intelligence. This paper considers the design and
implementation of a practical privacy-preserving collaborative learning scheme,
in which a curious learning coordinator trains a better machine learning model
based on the data samples contributed by a number of IoT objects, while the
confidentiality of the raw forms of the training data is protected against the
coordinator. Existing distributed machine learning and data encryption
approaches incur significant computation and communication overhead, rendering
them ill-suited for resource-constrained IoT objects. We study an approach that
applies independent Gaussian random projection at each IoT object to obfuscate
data and trains a deep neural network at the coordinator based on the projected
data from the IoT objects. This approach introduces light computation overhead
to the IoT objects and moves most workload to the coordinator that can have
sufficient computing resources. Although the independent projections performed
by the IoT objects address the potential collusion between the curious
coordinator and some compromised IoT objects, they significantly increase the
complexity of the projected data. In this paper, we leverage the superior
learning capability of deep learning in capturing sophisticated patterns to
maintain good learning performance. Extensive comparative evaluation shows that
this approach outperforms other lightweight approaches that apply additive
noisification for differential privacy and/or support vector machines for
learning in the applications with light data pattern complexities.Comment: 12 pages,IOTDI 201
Accuracy-aware privacy mechanisms for distributed computation
Distributed computing systems involve a network of devices or agents that use locally stored private information to solve a common problem. Distributed algorithms fundamentally require communication between devices leaving the system vulnerable to "privacy attacks" perpetrated by adversarial agents. In this dissertation, we focus on designing privacy-preserving distributed algorithms for -- (a) solving distributed optimization problems, (b) computing equilibrium of network aggregate games, and (c) solving a distributed system of linear equations. Specifically, we propose a privacy definition for distributed computation "non-identifiability", that allow us to simultaneously guarantee privacy and the accuracy of the computed solution. This definition involves showing that information observed by the adversary is compatible with several distributed computing problems and the associated ambiguity provides privacy.
Distributed Optimization: We propose the Function Sharing strategy that involves using correlated random functions to obfuscate private objective functions followed by using a standard distributed optimization algorithm. We characterize a tight graph connectivity condition for proving privacy via non-identifiability of local objective functions. We also prove correctness of our algorithm and show that we can achieve privacy and accuracy simultaneously.
Network Aggregate Games: We design a distributed Nash equilibrium computation algorithm for network aggregate games. Our algorithm uses locally balanced correlated random perturbations to hide information shared with neighbors for aggregate estimation. This step is followed by descent along the negative gradient of the local cost function. We show that if the graph of non-adversarial agents is connected and non-bipartite, then our algorithm keeps private local cost information non-identifiable while asymptotically converging to the accurate Nash equilibrium.
Average Consensus and System of Linear Equations: Finally, we design a finite-time algorithm for solving the average consensus problem over directed graphs with information-theoretic privacy. We use this algorithm to solve a distributed system of linear equations in finite-time while protecting the privacy of local equations. We characterize computation, communication, memory and iteration cost of our algorithm and characterize graph conditions for guaranteeing information-theoretic privacy of local data
Synthetic Dataset Generation for Privacy-Preserving Machine Learning
Machine Learning (ML) has achieved enormous success in solving a variety of
problems in computer vision, speech recognition, object detection, to name a
few. The principal reason for this success is the availability of huge datasets
for training deep neural networks (DNNs). However, datasets cannot be publicly
released if they contain sensitive information such as medical records, and
data privacy becomes a major concern. Encryption methods could be a possible
solution, however their deployment on ML applications seriously impacts
classification accuracy and results in substantial computational overhead.
Alternatively, obfuscation techniques could be used, but maintaining a good
trade-off between visual privacy and accuracy is challenging. In this paper, we
propose a method to generate secure synthetic datasets from the original
private datasets. Given a network with Batch Normalization (BN) layers
pretrained on the original dataset, we first record the class-wise BN layer
statistics. Next, we generate the synthetic dataset by optimizing random noise
such that the synthetic data match the layer-wise statistical distribution of
original images. We evaluate our method on image classification datasets
(CIFAR10, ImageNet) and show that synthetic data can be used in place of the
original CIFAR10/ImageNet data for training networks from scratch, producing
comparable classification performance. Further, to analyze visual privacy
provided by our method, we use Image Quality Metrics and show high degree of
visual dissimilarity between the original and synthetic images. Moreover, we
show that our proposed method preserves data-privacy under various
privacy-leakage attacks including Gradient Matching Attack, Model Memorization
Attack, and GAN-based Attack
- …