25,838 research outputs found

    Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

    Full text link
    Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

    Conditionals in Homomorphic Encryption and Machine Learning Applications

    Get PDF
    Homomorphic encryption aims at allowing computations on encrypted data without decryption other than that of the final result. This could provide an elegant solution to the issue of privacy preservation in data-based applications, such as those using machine learning, but several open issues hamper this plan. In this work we assess the possibility for homomorphic encryption to fully implement its program without relying on other techniques, such as multiparty computation (SMPC), which may be impossible in many use cases (for instance due to the high level of communication required). We proceed in two steps: i) on the basis of the structured program theorem (Bohm-Jacopini theorem) we identify the relevant minimal set of operations homomorphic encryption must be able to perform to implement any algorithm; and ii) we analyse the possibility to solve -- and propose an implementation for -- the most fundamentally relevant issue as it emerges from our analysis, that is, the implementation of conditionals (requiring comparison and selection/jump operations). We show how this issue clashes with the fundamental requirements of homomorphic encryption and could represent a drawback for its use as a complete solution for privacy preservation in data-based applications, in particular machine learning ones. Our approach for comparisons is novel and entirely embedded in homomorphic encryption, while previous studies relied on other techniques, such as SMPC, demanding high level of communication among parties, and decryption of intermediate results from data-owners. Our protocol is also provably safe (sharing the same safety as the homomorphic encryption schemes), differently from other techniques such as Order-Preserving/Revealing-Encryption (OPE/ORE).Comment: 14 pages, 1 figure, corrected typos, added introductory pedagogical section on polynomial approximatio

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201
    • …
    corecore