16,743 research outputs found

    Practical Privacy-Preserving K-means Clustering

    Get PDF
    Clustering is a common technique for data analysis, which aims to partition data into similar groups. When the data comes from different sources, it is highly desirable to maintain the privacy of each database. In this work, we study a popular clustering algorithm (K-means) and adapt it to the privacy-preserving context. Specifically, to construct our privacy-preserving clustering algorithm, we first propose an efficient batched Euclidean squared distance computation protocol in the adaptive amortizing setting, when one needs to compute the distance from the same point to other points. This protocol can also serve as a key building block in many real-world applications such as Bio-metric Identification. Furthermore, we construct a customized garbled circuit for computing the minimum value among shared values. We implement and evaluate our protocols to demonstrate their practicality and show that they are able to train datasets that are much larger and faster than in the previous work. The numerical results also show that the proposed protocol achieve almost the same accuracy compared to a K-means plain-text clustering algorithm

    Privacy-Preserving and Outsourced Multi-User k-Means Clustering

    Get PDF
    Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table

    Privacy-Preserving Privacy Profile Proposal Protocol

    Get PDF
    Many web-based and mobile applications and services allow users to indicate their preferences regarding whether and how their personal information can be used or reused by the application itself, by the service provider, and/or by third parties. The number of possible configurations that constitute a user\u27s preference profile can be overwhelming to a typical user. This report describes a practical, privacy-preserving technique for reducing the burden users face when specifying their preferences by offering users data-driven recommendations for fully-specified preference profiles based on their inputs for just a few settings. The feasibility of the approach is demonstrated by a browser-based prototype application that relies on secure multi-party computation and uses the web-compatible JIFF library as the backbone for managing communications between the client application and the recommendation service. The principal algorithms used for generating proposed preference profiles are kk-means clustering (for privacy-preserving analysis of preference profile data across multiple users) and kk-nearest neighbors (for selecting a proposed preference profile to recommend to the user)

    Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

    Full text link
    The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the dataset is divided into multiple parties or when the dataset is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The dataset is divided by the data owner among the servers who together perform the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We provide a new method to perform k-means over a large set using multiple servers. Our technique avoids heavy cryptographic computations and instead we use a simple randomization technique to preserve the privacy of the data. The k-means computed has exactly the same efficiency and accuracy as the k-means computed over the original dataset without any randomization. We argue that our algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems Security. Springer, Cham, 201

    SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search

    Get PDF
    The kk-Nearest Neighbor Search (kk-NNS) is the backbone of several cloud-based services such as recommender systems, face recognition, and database search on text and images. In these services, the client sends the query to the cloud server and receives the response in which case the query and response are revealed to the service provider. Such data disclosures are unacceptable in several scenarios due to the sensitivity of data and/or privacy laws. In this paper, we introduce SANNS, a system for secure kk-NNS that keeps client's query and the search result confidential. SANNS comprises two protocols: an optimized linear scan and a protocol based on a novel sublinear time clustering-based algorithm. We prove the security of both protocols in the standard semi-honest model. The protocols are built upon several state-of-the-art cryptographic primitives such as lattice-based additively homomorphic encryption, distributed oblivious RAM, and garbled circuits. We provide several contributions to each of these primitives which are applicable to other secure computation tasks. Both of our protocols rely on a new circuit for the approximate top-kk selection from nn numbers that is built from O(n+k2)O(n + k^2) comparators. We have implemented our proposed system and performed extensive experimental results on four datasets in two different computation environments, demonstrating more than 1831×18-31\times faster response time compared to optimally implemented protocols from the prior work. Moreover, SANNS is the first work that scales to the database of 10 million entries, pushing the limit by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202

    Differentially Private Mixture of Generative Neural Networks

    Get PDF
    Generative models are used in a wide range of applications building on large amounts of contextually rich information. Due to possible privacy violations of the individuals whose data is used to train these models, however, publishing or sharing generative models is not always viable. In this paper, we present a novel technique for privately releasing generative models and entire high-dimensional datasets produced by these models. We model the generator distribution of the training data with a mixture of kk generative neural networks. These are trained together and collectively learn the generator distribution of a dataset. Data is divided into kk clusters, using a novel differentially private kernel kk-means, then each cluster is given to separate generative neural networks, such as Restricted Boltzmann Machines or Variational Autoencoders, which are trained only on their own cluster using differentially private gradient descent. We evaluate our approach using the MNIST dataset, as well as call detail records and transit datasets, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE International Conference on Data Mining (ICDM 2017). This is the full version, published in IEEE Transactions on Knowledge and Data Engineering (TKDE
    corecore