848 research outputs found
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
Machine learning algorithms, when applied to sensitive data, pose a distinct
threat to privacy. A growing body of prior work demonstrates that models
produced by these algorithms may leak specific private information in the
training data to an attacker, either through the models' structure or their
observable behavior. However, the underlying cause of this privacy risk is not
well understood beyond a handful of anecdotal accounts that suggest overfitting
and influence might play a role.
This paper examines the effect that overfitting and influence have on the
ability of an attacker to learn information about the training data from
machine learning models, either through training set membership inference or
attribute inference attacks. Using both formal and empirical analyses, we
illustrate a clear relationship between these factors and the privacy risk that
arises in several popular machine learning algorithms. We find that overfitting
is sufficient to allow an attacker to perform membership inference and, when
the target attribute meets certain conditions about its influence, attribute
inference attacks. Interestingly, our formal analysis also shows that
overfitting is not necessary for these attacks and begins to shed light on what
other factors may be in play. Finally, we explore the connection between
membership inference and attribute inference, showing that there are deep
connections between the two that lead to effective new attacks
SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Deploying machine learning models in production may allow adversaries to
infer sensitive information about training data. There is a vast literature
analyzing different types of inference risks, ranging from membership inference
to reconstruction attacks. Inspired by the success of games (i.e.,
probabilistic experiments) to study security properties in cryptography, some
authors describe privacy inference risks in machine learning using a similar
game-based style. However, adversary capabilities and goals are often stated in
subtly different ways from one presentation to the other, which makes it hard
to relate and compose results. In this paper, we present a game-based framework
to systematize the body of knowledge on privacy inference risks in machine
learning. We use this framework to (1) provide a unifying structure for
definitions of inference risks, (2) formally establish known relations among
definitions, and (3) to uncover hitherto unknown relations that would have been
difficult to spot otherwise.Comment: 20 pages, to appear in 2023 IEEE Symposium on Security and Privac
- …