6,647 research outputs found
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Machine Learning (ML) algorithms are used to train computers to perform a
variety of complex tasks and improve with experience. Computers learn how to
recognize patterns, make unintended decisions, or react to a dynamic
environment. Certain trained machines may be more effective than others because
they are based on more suitable ML algorithms or because they were trained
through superior training sets. Although ML algorithms are known and publicly
released, training sets may not be reasonably ascertainable and, indeed, may be
guarded as trade secrets. While much research has been performed about the
privacy of the elements of training sets, in this paper we focus our attention
on ML classifiers and on the statistical information that can be unconsciously
or maliciously revealed from them. We show that it is possible to infer
unexpected but useful information from ML classifiers. In particular, we build
a novel meta-classifier and train it to hack other classifiers, obtaining
meaningful information about their training sets. This kind of information
leakage can be exploited, for example, by a vendor to build more effective
classifiers or to simply acquire trade secrets from a competitor's apparatus,
potentially violating its intellectual property rights
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
- …