1,442 research outputs found
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Deep Learning has recently become hugely popular in machine learning,
providing significant improvements in classification accuracy in the presence
of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning.
Models are typically trained in a centralized manner with all the data being
processed by the same training algorithm. If the data is a collection of users'
private data, including habits, personal pictures, geographical positions,
interests, and more, the centralized server will have access to sensitive
information that could potentially be mishandled. To tackle this problem,
collaborative deep learning models have recently been proposed where parties
locally train their deep learning structures and only share a subset of the
parameters in the attempt to keep their respective training sets private.
Parameters can also be obfuscated via differential privacy (DP) to make
information extraction even more challenging, as proposed by Shokri and
Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep
learning is susceptible to a powerful attack that we devise in this paper. In
particular, we show that a distributed, federated, or decentralized deep
learning approach is fundamentally broken and does not protect the training
sets of honest participants. The attack we developed exploits the real-time
nature of the learning process that allows the adversary to train a Generative
Adversarial Network (GAN) that generates prototypical samples of the targeted
training set that was meant to be private (the samples generated by the GAN are
intended to come from the same distribution as the training data).
Interestingly, we show that record-level DP applied to the shared parameters of
the model, as suggested in previous work, is ineffective (i.e., record-level DP
is not designed to address our attack).Comment: ACM CCS'17, 16 pages, 18 figure
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
- …