313 research outputs found

    Privacy Preserving Optics Clustering

    Get PDF
    OPTICS is a well-known density-based clustering algorithm which uses DBSCAN theme without producing a clustering of a data set openly, but as a substitute, it creates an augmented ordering of that particular database which represents its density-based clustering structure. This resulted cluster-ordering comprises information which is similar to the density based clustering’s conforming to a wide range of parameter settings. The same algorithm can be applied in the field of privacy-preserving data mining, where extracting the useful information from data which is distributed over a network requires preservation of privacy of individuals’ information. The problem of getting the clusters of a distributed database is considered as an example of this algorithm, where two parties want to know their cluster numbers on combined database without revealing one party information to other party. This issue can be seen as a particular example of secure multi-party computation and such sort of issues can be solved with the assistance of proposed protocols in our work along with some standard protocols

    Secure k-Nearest Neighbor Query over Encrypted Data in Outsourced Environments

    Full text link
    For the past decade, query processing on relational data has been studied extensively, and many theoretical and practical solutions to query processing have been proposed under various scenarios. With the recent popularity of cloud computing, users now have the opportunity to outsource their data as well as the data management tasks to the cloud. However, due to the rise of various privacy issues, sensitive data (e.g., medical records) need to be encrypted before outsourcing to the cloud. In addition, query processing tasks should be handled by the cloud; otherwise, there would be no point to outsource the data at the first place. To process queries over encrypted data without the cloud ever decrypting the data is a very challenging task. In this paper, we focus on solving the k-nearest neighbor (kNN) query problem over encrypted database outsourced to a cloud: a user issues an encrypted query record to the cloud, and the cloud returns the k closest records to the user. We first present a basic scheme and demonstrate that such a naive solution is not secure. To provide better security, we propose a secure kNN protocol that protects the confidentiality of the data, user's input query, and data access patterns. Also, we empirically analyze the efficiency of our protocols through various experiments. These results indicate that our secure protocol is very efficient on the user end, and this lightweight scheme allows a user to use any mobile device to perform the kNN query.Comment: 23 pages, 8 figures, and 4 table

    Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

    Full text link
    The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems

    k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data

    Full text link
    Data Mining has wide applications in many areas such as banking, medicine, scientific research and among government agencies. Classification is one of the commonly used tasks in data mining applications. For the past decade, due to the rise of various privacy issues, many theoretical and practical solutions to the classification problem have been proposed under different security models. However, with the recent popularity of cloud computing, users now have the opportunity to outsource their data, in encrypted form, as well as the data mining tasks to the cloud. Since the data on the cloud is in encrypted form, existing privacy preserving classification techniques are not applicable. In this paper, we focus on solving the classification problem over encrypted data. In particular, we propose a secure k-NN classifier over encrypted data in the cloud. The proposed k-NN protocol protects the confidentiality of the data, user's input query, and data access patterns. To the best of our knowledge, our work is the first to develop a secure k-NN classifier over encrypted data under the semi-honest model. Also, we empirically analyze the efficiency of our solution through various experiments.Comment: 29 pages, 2 figures, 3 tables arXiv admin note: substantial text overlap with arXiv:1307.482

    Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

    Full text link
    In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided

    Privacy Preserving Distributed Data Mining

    Get PDF
    Privacy preserving distributed data mining aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the data privacy. My research focuses on the design and implementation of privacy preserving two-party protocols based on homomorphic encryption. I present new results in this area, including new secure protocols for basic operations and two fundamental privacy preserving data mining protocols. I propose a number of secure protocols for basic operations in the additive secret-sharing scheme based on homomorphic encryption. I derive a basic relationship between a secret number and its shares, with which we develop efficient secure comparison and secure division with public divisor protocols. I also design a secure inverse square root protocol based on Newton\u27s iterative method and hence propose a solution for the secure square root problem. In addition, we propose a secure exponential protocol based on Taylor series expansions. All these protocols are implemented using secure multiplication and can be used to develop privacy preserving distributed data mining protocols. In particular, I develop efficient privacy preserving protocols for two fundamental data mining tasks: multiple linear regression and EM clustering. Both protocols work for arbitrarily partitioned datasets. The two-party privacy preserving linear regression protocol is provably secure in the semi-honest model, and the EM clustering protocol discloses only the number of iterations. I provide a proof-of-concept implementation of these protocols in C++, based on the Paillier cryptosystem

    Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis

    Get PDF
    To build reliable prediction models and identify useful patterns, assembling data sets from databases maintained by different sources such as hospitals becomes increasingly common; however, it might divulge sensitive information about individuals and thus leads to increased concerns about privacy, which in turn prevents different parties from sharing information. Privacy Preserving Distributed Data Mining (PPDDM) provides a means to address this issue without accessing actual data values to avoid the disclosure of information beyond the final result. In recent years, a number of state-of-the-art PPDDM approaches have been developed, most of which are based on Secure Multiparty Computation (SMC). SMC requires expensive communication cost and sophisticated secure computation. Besides, the mining progress is inevitable to slow down due to the increasing volume of the aggregated data. In this work, a new framework named Privacy-Aware Non-linear SVM (PAN-SVM) is proposed to build a PPDDM model from multiple data sources. PAN-SVM employs the Secure Sum Protocol to protect privacy at the bottom layer, and reduces the complex communication and computation via Nystrom matrix approximation and Eigen decomposition methods at the medium layer. The top layer of PAN-SVM speeds up the whole algorithm for large scale datasets. Based on the proposed framework of PAN-SVM, a Privacy Preserving Multi-class Classifier is built, and the experimental results on several benchmark datasets and microarray datasets show its abilities to improve classification accuracy compared with a regular SVM. In addition, two Privacy Preserving Feature Selection methods are also proposed based on PAN-SVM, and tested by using benchmark data and real world data. PAN-SVM does not depend on a trusted third party; all participants collaborate equally. Many experimental results show that PAN-SVM can not only effectively solve the problem of collaborative privacy-preserving data mining by building non-linear classification rules, but also significantly improve the performance of built classifiers
    corecore