25,838 research outputs found
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Machine Learning (ML) algorithms are used to train computers to perform a
variety of complex tasks and improve with experience. Computers learn how to
recognize patterns, make unintended decisions, or react to a dynamic
environment. Certain trained machines may be more effective than others because
they are based on more suitable ML algorithms or because they were trained
through superior training sets. Although ML algorithms are known and publicly
released, training sets may not be reasonably ascertainable and, indeed, may be
guarded as trade secrets. While much research has been performed about the
privacy of the elements of training sets, in this paper we focus our attention
on ML classifiers and on the statistical information that can be unconsciously
or maliciously revealed from them. We show that it is possible to infer
unexpected but useful information from ML classifiers. In particular, we build
a novel meta-classifier and train it to hack other classifiers, obtaining
meaningful information about their training sets. This kind of information
leakage can be exploited, for example, by a vendor to build more effective
classifiers or to simply acquire trade secrets from a competitor's apparatus,
potentially violating its intellectual property rights
Conditionals in Homomorphic Encryption and Machine Learning Applications
Homomorphic encryption aims at allowing computations on encrypted data
without decryption other than that of the final result. This could provide an
elegant solution to the issue of privacy preservation in data-based
applications, such as those using machine learning, but several open issues
hamper this plan. In this work we assess the possibility for homomorphic
encryption to fully implement its program without relying on other techniques,
such as multiparty computation (SMPC), which may be impossible in many use
cases (for instance due to the high level of communication required). We
proceed in two steps: i) on the basis of the structured program theorem
(Bohm-Jacopini theorem) we identify the relevant minimal set of operations
homomorphic encryption must be able to perform to implement any algorithm; and
ii) we analyse the possibility to solve -- and propose an implementation for --
the most fundamentally relevant issue as it emerges from our analysis, that is,
the implementation of conditionals (requiring comparison and selection/jump
operations). We show how this issue clashes with the fundamental requirements
of homomorphic encryption and could represent a drawback for its use as a
complete solution for privacy preservation in data-based applications, in
particular machine learning ones. Our approach for comparisons is novel and
entirely embedded in homomorphic encryption, while previous studies relied on
other techniques, such as SMPC, demanding high level of communication among
parties, and decryption of intermediate results from data-owners. Our protocol
is also provably safe (sharing the same safety as the homomorphic encryption
schemes), differently from other techniques such as
Order-Preserving/Revealing-Encryption (OPE/ORE).Comment: 14 pages, 1 figure, corrected typos, added introductory pedagogical
section on polynomial approximatio
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
- …