12,131 research outputs found

    Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

    Full text link
    The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems

    DeepSecure: Scalable Provably-Secure Deep Learning

    Get PDF
    This paper proposes DeepSecure, a novel framework that enables scalable execution of the state-of-the-art Deep Learning (DL) models in a privacy-preserving setting. DeepSecure targets scenarios in which neither of the involved parties including the cloud servers that hold the DL model parameters or the delegating clients who own the data is willing to reveal their information. Our framework is the first to empower accurate and scalable DL analysis of data generated by distributed clients without sacrificing the security to maintain efficiency. The secure DL computation in DeepSecure is performed using Yao's Garbled Circuit (GC) protocol. We devise GC-optimized realization of various components used in DL. Our optimized implementation achieves more than 58-fold higher throughput per sample compared with the best-known prior solution. In addition to our optimized GC realization, we introduce a set of novel low-overhead pre-processing techniques which further reduce the GC overall runtime in the context of deep learning. Extensive evaluations of various DL applications demonstrate up to two orders-of-magnitude additional runtime improvement achieved as a result of our pre-processing methodology. This paper also provides mechanisms to securely delegate GC computations to a third party in constrained embedded settings

    Privacy-Preserving and Outsourced Multi-User k-Means Clustering

    Get PDF
    Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table

    Privacy-Preserving Cloud-Assisted Data Analytics

    Get PDF
    Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users\u27 data may contain private information that needs to be protected. Cloud computing has become more and more popular in both academia and industry communities. By pooling infrastructure and servers together, it can offer virtually unlimited resources easily accessible via the Internet. Various services could be provided by cloud platforms including machine learning and data analytics. The goal of this dissertation is to develop privacy-preserving cloud-assisted data analytics solutions to address the aforementioned challenges, leveraging the powerful and easy-to-access cloud. In particular, we propose the following systems. To address the problem of limited computation power at user and the need of privacy protection in data analytics, we consider geometric programming (GP) in data analytics, and design a secure, efficient, and verifiable outsourcing protocol for GP. Our protocol consists of a transform scheme that converts GP to DGP, a transform scheme with computationally indistinguishability, and an efficient scheme to solve the transformed DGP at the cloud side with result verification. Evaluation results show that the proposed secure outsourcing protocol can achieve significant time savings for users. To address the problem of limited data at individual users, we propose two distributed learning systems such that users can collaboratively train machine learning models without losing privacy. The first one is a differentially private framework to train logistic regression models with distributed data sources. We employ the relevance between input data features and the model output to significantly improve the learning accuracy. Moreover, we adopt an evaluation data set at the cloud side to suppress low-quality data sources and propose a differentially private mechanism to protect user\u27s data quality privacy. Experimental results show that the proposed framework can achieve high utility with low quality data, and strong privacy guarantee. The second one is an efficient privacy-preserving federated learning system that enables multiple edge users to collaboratively train their models without revealing dataset. To reduce the communication overhead, we select well-aligned and large-enough magnitude gradients for uploading which leads to quick convergence. To minimize the noise added and improve model utility, each user only adds a small amount of noise to his selected gradients, encrypts the noise gradients before uploading, and the cloud server will only get the aggregate gradients that contain enough noise to achieve differential privacy. Evaluation results show that the proposed system can achieve high accuracy, low communication overhead, and strong privacy guarantee. In future work, we plan to design a privacy-preserving data analytics with fair exchange, which ensures the payment fairness. We will also consider designing distributed learning systems with heterogeneous architectures

    The Potential for Machine Learning Analysis over Encrypted Data in Cloud-based Clinical Decision Support - Background and Review

    Get PDF
    This paper appeared at the 8th Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2015), Sydney, Australia, January 2015. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 164, Anthony Maeder and Jim Warren, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is includedIn an effort to reduce the risk of sensitive data exposure in untrusted networks such as the public cloud, increasing attention has recently been given to encryption schemes that allow specific computations to occur on encrypted data, without the need for decryption. This relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data in a meaningful way while still in encrypted form. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. This review paper examines the history and current status of homomoprhic encryption and its potential for preserving the privacy of patient data underpinning cloud-based CDS applications

    k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data

    Full text link
    Data Mining has wide applications in many areas such as banking, medicine, scientific research and among government agencies. Classification is one of the commonly used tasks in data mining applications. For the past decade, due to the rise of various privacy issues, many theoretical and practical solutions to the classification problem have been proposed under different security models. However, with the recent popularity of cloud computing, users now have the opportunity to outsource their data, in encrypted form, as well as the data mining tasks to the cloud. Since the data on the cloud is in encrypted form, existing privacy preserving classification techniques are not applicable. In this paper, we focus on solving the classification problem over encrypted data. In particular, we propose a secure k-NN classifier over encrypted data in the cloud. The proposed k-NN protocol protects the confidentiality of the data, user's input query, and data access patterns. To the best of our knowledge, our work is the first to develop a secure k-NN classifier over encrypted data under the semi-honest model. Also, we empirically analyze the efficiency of our solution through various experiments.Comment: 29 pages, 2 figures, 3 tables arXiv admin note: substantial text overlap with arXiv:1307.482

    A Privacy-Preserving Outsourced Data Model in Cloud Environment

    Full text link
    Nowadays, more and more machine learning applications, such as medical diagnosis, online fraud detection, email spam filtering, etc., services are provided by cloud computing. The cloud service provider collects the data from the various owners to train or classify the machine learning system in the cloud environment. However, multiple data owners may not entirely rely on the cloud platform that a third party engages. Therefore, data security and privacy problems are among the critical hindrances to using machine learning tools, particularly with multiple data owners. In addition, unauthorized entities can detect the statistical input data and infer the machine learning model parameters. Therefore, a privacy-preserving model is proposed, which protects the privacy of the data without compromising machine learning efficiency. In order to protect the data of data owners, the epsilon-differential privacy is used, and fog nodes are used to address the problem of the lower bandwidth and latency in this proposed scheme. The noise is produced by the epsilon-differential mechanism, which is then added to the data. Moreover, the noise is injected at the data owner site to protect the owners data. Fog nodes collect the noise-added data from the data owners, then shift it to the cloud platform for storage, computation, and performing the classification tasks purposes
    corecore