3,243 research outputs found
Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning
Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML service providers updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results. In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update. This constitutes a new attack surface against black-box ML models and such information leakage severely damages the intellectual property and data privacy of the ML model owner/provider. In contrast to membership inference attacks, we use an encoder-decoder formulation that allows inferring diverse information ranging from detailed characteristics to full reconstruction of the dataset. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (BM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows generating accurate samples. Our experiments show effective prediction of dataset characteristics and even full reconstruction in challenging conditions
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning
Deep neural networks are susceptible to various inference attacks as they
remember information about their training data. We design white-box inference
attacks to perform a comprehensive privacy analysis of deep learning models. We
measure the privacy leakage through parameters of fully trained models as well
as the parameter updates of models during training. We design inference
algorithms for both centralized and federated learning, with respect to passive
and active inference attackers, and assuming different adversary prior
knowledge.
We evaluate our novel white-box membership inference attacks against deep
learning algorithms to trace their training data records. We show that a
straightforward extension of the known black-box attacks to the white-box
setting (through analyzing the outputs of activation functions) is ineffective.
We therefore design new algorithms tailored to the white-box setting by
exploiting the privacy vulnerabilities of the stochastic gradient descent
algorithm, which is the algorithm used to train deep neural networks. We
investigate the reasons why deep learning models may leak information about
their training data. We then show that even well-generalized models are
significantly susceptible to white-box membership inference attacks, by
analyzing state-of-the-art pre-trained and publicly available models for the
CIFAR dataset. We also show how adversarial participants, in the federated
learning setting, can successfully run active membership inference attacks
against other participants, even when the global model achieves high prediction
accuracies.Comment: 2019 IEEE Symposium on Security and Privacy (SP
GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models
Deep learning has achieved overwhelming success, spanning from discriminative
models to generative models. In particular, deep generative models have
facilitated a new level of performance in a myriad of areas, ranging from media
manipulation to sanitized dataset generation. Despite the great success, the
potential risks of privacy breach caused by generative models have not been
analyzed systematically. In this paper, we focus on membership inference attack
against deep generative models that reveals information about the training data
used for victim models. Specifically, we present the first taxonomy of
membership inference attacks, encompassing not only existing attacks but also
our novel ones. In addition, we propose the first generic attack model that can
be instantiated in a large range of settings and is applicable to various kinds
of deep generative models. Moreover, we provide a theoretically grounded attack
calibration technique, which consistently boosts the attack performance in all
cases, across different attack settings, data modalities, and training
configurations. We complement the systematic analysis of attack performance by
a comprehensive experimental study, that investigates the effectiveness of
various attacks w.r.t. model type and training configurations, over three
diverse application scenarios (i.e., images, medical data, and location data).Comment: CCS 2020, 20 page
A Survey of Privacy Attacks in Machine Learning
As machine learning becomes more widely used, the need to study its
implications in security and privacy becomes more urgent. Although the body of
work in privacy has been steadily growing over the past few years, research on
the privacy aspects of machine learning has received less focus than the
security aspects. Our contribution in this research is an analysis of more than
40 papers related to privacy attacks against machine learning that have been
published during the past seven years. We propose an attack taxonomy, together
with a threat model that allows the categorization of different attacks based
on the adversarial knowledge, and the assets under attack. An initial
exploration of the causes of privacy leaks is presented, as well as a detailed
analysis of the different attacks. Finally, we present an overview of the most
commonly proposed defenses and a discussion of the open problems and future
directions identified during our analysis.Comment: Under revie
Generating Artificial Data for Private Deep Learning
In this paper, we propose generating artificial data that retain statistical
properties of real data as the means of providing privacy with respect to the
original dataset. We use generative adversarial network to draw
privacy-preserving artificial data samples and derive an empirical method to
assess the risk of information disclosure in a differential-privacy-like way.
Our experiments show that we are able to generate artificial data of high
quality and successfully train and validate machine learning models on this
data while limiting potential privacy loss.Comment: Privacy-Enhancing Artificial Intelligence and Language Technologies,
AAAI Spring Symposium Series, 201
- …