16,656 research outputs found

    Differentially Private Empirical Risk Minimization

    Full text link
    Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ϵ\epsilon-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning Researc

    Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud

    Get PDF
    Emerging cloud computing infrastructure replaces traditional outsourcing techniques and provides flexible services to clients at different locations via Internet. This leads to the requirement for data classification to be performed by potentially untrusted servers in the cloud. Within this context, classifier built by the server can be utilized by clients in order to classify their own data samples over the cloud. In this paper, we study a privacy-preserving (PP) data classification technique where the server is unable to learn any knowledge about clients' input data samples while the server side classifier is also kept secret from the clients during the classification process. More specifically, to the best of our knowledge, we propose the first known client-server data classification protocol using support vector machine. The proposed protocol performs PP classification for both two-class and multi-class problems. The protocol exploits properties of Pailler homomorphic encryption and secure two-party computation. At the core of our protocol lies an efficient, novel protocol for securely obtaining the sign of Pailler encrypted numbers

    Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models based on Pre-Computation

    Get PDF
    Many data-driven personalized services require that private data of users is scored against a trained machine learning model. In this paper we propose a novel protocol for privacy-preserving classification of decision trees, a popular machine learning model in these scenarios. Our solutions are composed out of building blocks, namely a secure comparison protocol, a protocol for obliviously selecting inputs, and a protocol for evaluating polynomials. By combining some of the building blocks for our decision tree classification protocol, we also improve previously proposed solutions for classification of support vector machines and logistic regression models. Our protocols are information theoretically secure and, unlike previously proposed solutions, do not require modular exponentiations. We show that our protocols for privacy-preserving classification lead to more efficient results from the point of view of computational and communication complexities. We present accuracy and runtime results for 7 classification benchmark datasets from the UCI repository

    EPIC: Efficient Private Image Classification (or: Learning from the Masters)

    Get PDF
    Outsourcing an image classification task raises privacy concerns, both from the image provider\u27s perspective, who wishes to keep their images confidential, and from the classification algorithm provider\u27s perspective, who wishes to protect the intellectual property of their classifier. We propose EPIC, an efficient private image classification system based on support vector machine (SVM) learning, which is secure against malicious adversaries. The novelty of EPIC is that it builds upon transfer learning techniques known from the Machine Learning (ML) literature and minimizes the load on the privacy-preserving part. Our solution is based on Secure Multiparty Computation (MPC), it is 34 times faster than Gazelle (USENIX 2018) --the state-of-the-art in private image classification-- and it improves the total communication cost by 50 times, while achieving a 7\% higher accuracy on CIFAR-10 dataset. When benchmarked for performance, while maintaining the same CIFAR-10 accuracy as Gazelle, EPIC is 700 times faster and the communication cost is reduced by 500 times

    VirtualIdentity : privacy preserving user profiling

    Get PDF
    User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learning models for user profiling - which are the intellectual property of the company - to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy preserving manner
    • …
    corecore