35 research outputs found
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning
Deep neural networks are susceptible to various inference attacks as they
remember information about their training data. We design white-box inference
attacks to perform a comprehensive privacy analysis of deep learning models. We
measure the privacy leakage through parameters of fully trained models as well
as the parameter updates of models during training. We design inference
algorithms for both centralized and federated learning, with respect to passive
and active inference attackers, and assuming different adversary prior
knowledge.
We evaluate our novel white-box membership inference attacks against deep
learning algorithms to trace their training data records. We show that a
straightforward extension of the known black-box attacks to the white-box
setting (through analyzing the outputs of activation functions) is ineffective.
We therefore design new algorithms tailored to the white-box setting by
exploiting the privacy vulnerabilities of the stochastic gradient descent
algorithm, which is the algorithm used to train deep neural networks. We
investigate the reasons why deep learning models may leak information about
their training data. We then show that even well-generalized models are
significantly susceptible to white-box membership inference attacks, by
analyzing state-of-the-art pre-trained and publicly available models for the
CIFAR dataset. We also show how adversarial participants, in the federated
learning setting, can successfully run active membership inference attacks
against other participants, even when the global model achieves high prediction
accuracies.Comment: 2019 IEEE Symposium on Security and Privacy (SP
Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System
Neural language models are increasingly deployed into APIs and websites that
allow a user to pass in a prompt and receive generated text. Many of these
systems do not reveal generation parameters. In this paper, we present methods
to reverse-engineer the decoding method used to generate text (i.e., top- or
nucleus sampling). Our ability to discover which decoding strategy was used has
implications for detecting generated text. Additionally, the process of
discovering the decoding strategy can reveal biases caused by selecting
decoding settings which severely truncate a model's predicted distributions. We
perform our attack on several families of open-source language models, as well
as on production systems (e.g., ChatGPT).Comment: 6 pages, 4 figures, 3 tables. Also, 5 page appendix. Accepted to INLG
202
Tight Auditing of Differentially Private Machine Learning
Auditing mechanisms for differential privacy use probabilistic means to
empirically estimate the privacy level of an algorithm. For private machine
learning, existing auditing mechanisms are tight: the empirical privacy
estimate (nearly) matches the algorithm's provable privacy guarantee. But these
auditing techniques suffer from two limitations. First, they only give tight
estimates under implausible worst-case assumptions (e.g., a fully adversarial
dataset). Second, they require thousands or millions of training runs to
produce non-trivial statistical estimates of the privacy leakage.
This work addresses both issues. We design an improved auditing scheme that
yields tight privacy estimates for natural (not adversarially crafted) datasets
-- if the adversary can see all model updates during training. Prior auditing
works rely on the same assumption, which is permitted under the standard
differential privacy threat model. This threat model is also applicable, e.g.,
in federated learning settings. Moreover, our auditing scheme requires only two
training runs (instead of thousands) to produce tight privacy estimates, by
adapting recent advances in tight composition theorems for differential
privacy. We demonstrate the utility of our improved auditing schemes by
surfacing implementation bugs in private machine learning code that eluded
prior auditing techniques
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy
Studying data memorization in neural language models helps us understand the
risks (e.g., to privacy or copyright) associated with models regurgitating
training data and aids in the development of countermeasures. Many prior works
-- and some recently deployed defenses -- focus on "verbatim memorization",
defined as a model generation that exactly matches a substring from the
training set. We argue that verbatim memorization definitions are too
restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly
prevents all verbatim memorization. And yet, we demonstrate that this "perfect"
filter does not prevent the leakage of training data. Indeed, it is easily
circumvented by plausible and minimally modified "style-transfer" prompts --
and in some cases even the non-modified original prompts -- to extract
memorized information. We conclude by discussing potential alternative
definitions and why defining memorization is a difficult yet crucial open
question for neural language models