612 research outputs found
Membership Privacy for Machine Learning Models Through Knowledge Transfer
Large capacity machine learning (ML) models are prone to membership inference
attacks (MIAs), which aim to infer whether the target sample is a member of the
target model's training dataset. The serious privacy concerns due to the
membership inference have motivated multiple defenses against MIAs, e.g.,
differential privacy and adversarial regularization. Unfortunately, these
defenses produce ML models with unacceptably low classification performances.
Our work proposes a new defense, called distillation for membership privacy
(DMP), against MIAs that preserves the utility of the resulting models
significantly better than prior defenses. DMP leverages knowledge distillation
to train ML models with membership privacy. We provide a novel criterion to
tune the data used for knowledge transfer in order to amplify the membership
privacy of DMP. Our extensive evaluation shows that DMP provides significantly
better tradeoffs between membership privacy and classification accuracies
compared to state-of-the-art MIA defenses. For instance, DMP achieves ~100%
accuracy improvement over adversarial regularization for DenseNet trained on
CIFAR100, for similar membership privacy (measured using MIA risk): when the
MIA risk is 53.7%, adversarially regularized DenseNet is 33.6% accurate, while
DMP-trained DenseNet is 65.3% accurate.Comment: To Appear in the 35th AAAI Conference on Artificial Intelligence,
202
A Cautionary Tale: On the Role of Reference Data in Empirical Privacy Defenses
Within the realm of privacy-preserving machine learning, empirical privacy
defenses have been proposed as a solution to achieve satisfactory levels of
training data privacy without a significant drop in model utility. Most
existing defenses against membership inference attacks assume access to
reference data, defined as an additional dataset coming from the same (or a
similar) underlying distribution as training data. Despite the common use of
reference data, previous works are notably reticent about defining and
evaluating reference data privacy. As gains in model utility and/or training
data privacy may come at the expense of reference data privacy, it is essential
that all three aspects are duly considered. In this paper, we first examine the
availability of reference data and its privacy treatment in previous works and
demonstrate its necessity for fairly comparing defenses. Second, we propose a
baseline defense that enables the utility-privacy tradeoff with respect to both
training and reference data to be easily understood. Our method is formulated
as an empirical risk minimization with a constraint on the generalization
error, which, in practice, can be evaluated as a weighted empirical risk
minimization (WERM) over the training and reference datasets. Although we
conceived of WERM as a simple baseline, our experiments show that,
surprisingly, it outperforms the most well-studied and current state-of-the-art
empirical privacy defenses using reference data for nearly all relative privacy
levels of reference and training data. Our investigation also reveals that
these existing methods are unable to effectively trade off reference data
privacy for model utility and/or training data privacy. Overall, our work
highlights the need for a proper evaluation of the triad model utility /
training data privacy / reference data privacy when comparing privacy defenses
RobustBench: a standardized adversarial robustness benchmark
As a research community, we are still lacking a systematic understanding of
the progress on adversarial robustness which often makes it hard to identify
the most promising ideas in training robust models. A key challenge in
benchmarking robustness is that its evaluation is often error-prone leading to
robustness overestimation. Our goal is to establish a standardized benchmark of
adversarial robustness, which as accurately as possible reflects the robustness
of the considered models within a reasonable computational budget. To this end,
we start by considering the image classification task and introduce
restrictions (possibly loosened in the future) on the allowed models. We
evaluate adversarial robustness with AutoAttack, an ensemble of white- and
black-box attacks, which was recently shown in a large-scale study to improve
almost all robustness evaluations compared to the original publications. To
prevent overadaptation of new defenses to AutoAttack, we welcome external
evaluations based on adaptive attacks, especially where AutoAttack flags a
potential overestimation of robustness. Our leaderboard, hosted at
https://robustbench.github.io/, contains evaluations of 120+ models and aims at
reflecting the current state of the art in image classification on a set of
well-defined tasks in - and -threat models and on common
corruptions, with possible extensions in the future. Additionally, we
open-source the library https://github.com/RobustBench/robustbench that
provides unified access to 80+ robust models to facilitate their downstream
applications. Finally, based on the collected models, we analyze the impact of
robustness on the performance on distribution shifts, calibration,
out-of-distribution detection, fairness, privacy leakage, smoothness, and
transferability.Comment: The camera-ready version accepted at the NeurIPS'21 Datasets and
Benchmarks Track: 120+ evaluations, 80+ models, 7 leaderboards (Linf, L2,
common corruptions; CIFAR-10, CIFAR-100, ImageNet), significantly expanded
analysis part (calibration, fairness, privacy leakage, smoothness,
transferability
Survey: Leakage and Privacy at Inference Time
Leakage of data from publicly available Machine Learning (ML) models is an
area of growing significance as commercial and government applications of ML
can draw on multiple sources of data, potentially including users' and clients'
sensitive data. We provide a comprehensive survey of contemporary advances on
several fronts, covering involuntary data leakage which is natural to ML
models, potential malevolent leakage which is caused by privacy attacks, and
currently available defence mechanisms. We focus on inference-time leakage, as
the most likely scenario for publicly available models. We first discuss what
leakage is in the context of different data, tasks, and model architectures. We
then propose a taxonomy across involuntary and malevolent leakage, available
defences, followed by the currently available assessment metrics and
applications. We conclude with outstanding challenges and open questions,
outlining some promising directions for future research
- …