35 research outputs found
A Note On Interpreting Canary Exposure
Canary exposure, introduced in Carlini et al. is frequently used to
empirically evaluate, or audit, the privacy of machine learning model training.
The goal of this note is to provide some intuition on how to interpret canary
exposure, including by relating it to membership inference attacks and
differential privacy.Comment: short note, edited to add a sentence on independence of canary
losses, including adding Pillutla et a
SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning
Secure multiparty computation (MPC) has been proposed to allow multiple
mutually distrustful data owners to jointly train machine learning (ML) models
on their combined data. However, by design, MPC protocols faithfully compute
the training functionality, which the adversarial ML community has shown to
leak private information and can be tampered with in poisoning attacks. In this
work, we argue that model ensembles, implemented in our framework called
SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks.
The natural partitioning of data amongst owners in MPC training allows this
approach to be highly scalable at training time, provide provable protection
from poisoning attacks, and provably defense against a number of privacy
attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to
poisoning on several machine learning datasets and models trained in end-to-end
and transfer learning scenarios. For instance, SafeNet reduces backdoor attack
success significantly, while achieving faster training and less communication than the four-party MPC framework of Dalskov et al.
Our experiments show that ensembling retains these benefits even in many
non-iid settings. The simplicity, cheap setup, and robustness properties of
ensembling make it a strong first choice for training ML models privately in
MPC
Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks
Transferability captures the ability of an attack against a machine-learning
model to be effective against a different, potentially unknown, model.
Empirical evidence for transferability has been shown in previous work, but the
underlying reasons why an attack transfers or not are not yet well understood.
In this paper, we present a comprehensive analysis aimed to investigate the
transferability of both test-time evasion and training-time poisoning attacks.
We provide a unifying optimization framework for evasion and poisoning attacks,
and a formal definition of transferability of such attacks. We highlight two
main factors contributing to attack transferability: the intrinsic adversarial
vulnerability of the target model, and the complexity of the surrogate model
used to optimize the attack. Based on these insights, we define three metrics
that impact an attack's transferability. Interestingly, our results derived
from theoretical analysis hold for both evasion and poisoning attacks, and are
confirmed experimentally using a wide range of linear and non-linear
classifiers and datasets
SNAP: Efficient Extraction of Private Properties with Poisoning
Property inference attacks allow an adversary to extract global properties of
the training dataset from a machine learning model. Such attacks have privacy
implications for data owners sharing their datasets to train machine learning
models. Several existing approaches for property inference attacks against deep
neural networks have been proposed, but they all rely on the attacker training
a large number of shadow models, which induces a large computational overhead.
In this paper, we consider the setting of property inference attacks in which
the attacker can poison a subset of the training dataset and query the trained
target model. Motivated by our theoretical analysis of model confidences under
poisoning, we design an efficient property inference attack, SNAP, which
obtains higher attack success and requires lower amounts of poisoning than the
state-of-the-art poisoning-based property inference attack by Mahloujifar et
al. For example, on the Census dataset, SNAP achieves 34% higher success rate
than Mahloujifar et al. while being 56.5x faster. We also extend our attack to
infer whether a certain property was present at all during training and
estimate the exact proportion of a property of interest efficiently. We
evaluate our attack on several properties of varying proportions from four
datasets and demonstrate SNAP's generality and effectiveness. An open-source
implementation of SNAP can be found at https://github.com/johnmath/snap-sp23.Comment: 28 pages, 16 figure
Counterfactual Memorization in Neural Language Models
Modern neural language models that are widely used in various NLP tasks risk
memorizing sensitive information from their training data. Understanding this
memorization is important in real world applications and also from a
learning-theoretical perspective. An open question in previous studies of
language model memorization is how to filter out "common" memorization. In
fact, most memorization criteria strongly correlate with the number of
occurrences in the training set, capturing memorized familiar phrases, public
knowledge, templated texts, or other repeated data. We formulate a notion of
counterfactual memorization which characterizes how a model's predictions
change if a particular document is omitted during training. We identify and
study counterfactually-memorized training examples in standard text datasets.
We estimate the influence of each memorized training example on the validation
set and on generated texts, showing how this can provide direct evidence of the
source of memorization at test time.Comment: NeurIPS 2023; 42 pages, 33 figure
Fair Inputs and Fair Outputs: The Incompatibility of Fairness in Privacy and Accuracy
Fairness concerns about algorithmic decision-making systems have been mainly
focused on the outputs (e.g., the accuracy of a classifier across individuals
or groups). However, one may additionally be concerned with fairness in the
inputs. In this paper, we propose and formulate two properties regarding the
inputs of (features used by) a classifier. In particular, we claim that fair
privacy (whether individuals are all asked to reveal the same information) and
need-to-know (whether users are only asked for the minimal information required
for the task at hand) are desirable properties of a decision system. We explore
the interaction between these properties and fairness in the outputs (fair
prediction accuracy). We show that for an optimal classifier these three
properties are in general incompatible, and we explain what common properties
of data make them incompatible. Finally we provide an algorithm to verify if
the trade-off between the three properties exists in a given dataset, and use
the algorithm to show that this trade-off is common in real data