9 research outputs found
Formalizing and Estimating Distribution Inference Risks
Distribution inference, sometimes called property inference, infers
statistical properties about a training set from access to a model trained on
that data. Distribution inference attacks can pose serious risks when models
are trained on private data, but are difficult to distinguish from the
intrinsic purpose of statistical machine learning -- namely, to produce models
that capture statistical properties about a distribution. Motivated by Yeom et
al.'s membership inference framework, we propose a formal definition of
distribution inference attacks that is general enough to describe a broad class
of attacks distinguishing between possible training distributions. We show how
our definition captures previous ratio-based property inference attacks as well
as new kinds of attack including revealing the average node degree or
clustering coefficient of a training graph. To understand distribution
inference risks, we introduce a metric that quantifies observed leakage by
relating it to the leakage that would occur if samples from the training
distribution were provided directly to the adversary. We report on a series of
experiments across a range of different distributions using both novel
black-box attacks and improved versions of the state-of-the-art white-box
attacks. Our results show that inexpensive attacks are often as effective as
expensive meta-classifier attacks, and that there are surprising asymmetries in
the effectiveness of attacks. Code is available at
https://github.com/iamgroot42/FormEstDistRisksComment: Update: Accepted at PETS 202
SoK: Memorization in General-Purpose Large Language Models
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad
applications under development. Unlike most earlier machine learning models,
they are no longer built for one specific application but are designed to excel
in a wide range of tasks. A major part of this success is due to their huge
training datasets and the unprecedented number of model parameters, which allow
them to memorize large amounts of information contained in the training data.
This memorization goes beyond mere language, and encompasses information only
present in a few documents. This is often desirable since it is necessary for
performing tasks such as question answering, and therefore an important part of
learning, but also brings a whole array of issues, from privacy and security to
copyright and beyond. LLMs can memorize short secrets in the training data, but
can also memorize concepts like facts or writing styles that can be expressed
in text in many different ways. We propose a taxonomy for memorization in LLMs
that covers verbatim text, facts, ideas and algorithms, writing styles,
distributional properties, and alignment goals. We describe the implications of
each type of memorization - both positive and negative - for model performance,
privacy, security and confidentiality, copyright, and auditing, and ways to
detect and prevent memorization. We further highlight the challenges that arise
from the predominant way of defining memorization with respect to model
behavior instead of model weights, due to LLM-specific phenomena such as
reasoning capabilities or differences between decoding algorithms. Throughout
the paper, we describe potential risks and opportunities arising from
memorization in LLMs that we hope will motivate new research directions
SoK: Pitfalls in Evaluating Black-Box Attacks
Numerous works study black-box attacks on image classifiers. However, these
works make different assumptions on the adversary's knowledge and current
literature lacks a cohesive organization centered around the threat model. To
systematize knowledge in this area, we propose a taxonomy over the threat space
spanning the axes of feedback granularity, the access of interactive queries,
and the quality and quantity of the auxiliary data available to the attacker.
Our new taxonomy provides three key insights. 1) Despite extensive literature,
numerous under-explored threat spaces exist, which cannot be trivially solved
by adapting techniques from well-explored settings. We demonstrate this by
establishing a new state-of-the-art in the less-studied setting of access to
top-k confidence scores by adapting techniques from well-explored settings of
accessing the complete confidence vector, but show how it still falls short of
the more restrictive setting that only obtains the prediction label,
highlighting the need for more research. 2) Identification the threat model of
different attacks uncovers stronger baselines that challenge prior
state-of-the-art claims. We demonstrate this by enhancing an initially weaker
baseline (under interactive query access) via surrogate models, effectively
overturning claims in the respective paper. 3) Our taxonomy reveals
interactions between attacker knowledge that connect well to related areas,
such as model inversion and extraction attacks. We discuss how advances in
other areas can enable potentially stronger black-box attacks. Finally, we
emphasize the need for a more realistic assessment of attack success by
factoring in local attack runtime. This approach reveals the potential for
certain attacks to achieve notably higher success rates and the need to
evaluate attacks in diverse and harder settings, highlighting the need for
better selection criteria
SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Deploying machine learning models in production may allow adversaries to
infer sensitive information about training data. There is a vast literature
analyzing different types of inference risks, ranging from membership inference
to reconstruction attacks. Inspired by the success of games (i.e.,
probabilistic experiments) to study security properties in cryptography, some
authors describe privacy inference risks in machine learning using a similar
game-based style. However, adversary capabilities and goals are often stated in
subtly different ways from one presentation to the other, which makes it hard
to relate and compose results. In this paper, we present a game-based framework
to systematize the body of knowledge on privacy inference risks in machine
learning. We use this framework to (1) provide a unifying structure for
definitions of inference risks, (2) formally establish known relations among
definitions, and (3) to uncover hitherto unknown relations that would have been
difficult to spot otherwise.Comment: 20 pages, to appear in 2023 IEEE Symposium on Security and Privac
Subject Membership Inference Attacks in Federated Learning
Privacy in Federated Learning (FL) is studied at two different granularities:
item-level, which protects individual data points, and user-level, which
protects each user (participant) in the federation. Nearly all of the private
FL literature is dedicated to studying privacy attacks and defenses at these
two granularities. Recently, subject-level privacy has emerged as an
alternative privacy granularity to protect the privacy of individuals (data
subjects) whose data is spread across multiple (organizational) users in
cross-silo FL settings. An adversary might be interested in recovering private
information about these individuals (a.k.a. \emph{data subjects}) by attacking
the trained model. A systematic study of these patterns requires complete
control over the federation, which is impossible with real-world datasets. We
design a simulator for generating various synthetic federation configurations,
enabling us to study how properties of the data, model design and training, and
the federation itself impact subject privacy risk. We propose three attacks for
\emph{subject membership inference} and examine the interplay between all
factors within a federation that affect the attacks' efficacy. We also
investigate the effectiveness of Differential Privacy in mitigating this
threat. Our takeaways generalize to real-world datasets like FEMNIST, giving
credence to our findings