69,546 research outputs found

    Privacy Preserving Distributed Data Mining

    Get PDF
    Privacy preserving distributed data mining aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the data privacy. My research focuses on the design and implementation of privacy preserving two-party protocols based on homomorphic encryption. I present new results in this area, including new secure protocols for basic operations and two fundamental privacy preserving data mining protocols. I propose a number of secure protocols for basic operations in the additive secret-sharing scheme based on homomorphic encryption. I derive a basic relationship between a secret number and its shares, with which we develop efficient secure comparison and secure division with public divisor protocols. I also design a secure inverse square root protocol based on Newton\u27s iterative method and hence propose a solution for the secure square root problem. In addition, we propose a secure exponential protocol based on Taylor series expansions. All these protocols are implemented using secure multiplication and can be used to develop privacy preserving distributed data mining protocols. In particular, I develop efficient privacy preserving protocols for two fundamental data mining tasks: multiple linear regression and EM clustering. Both protocols work for arbitrarily partitioned datasets. The two-party privacy preserving linear regression protocol is provably secure in the semi-honest model, and the EM clustering protocol discloses only the number of iterations. I provide a proof-of-concept implementation of these protocols in C++, based on the Paillier cryptosystem

    Privacy-Preserving and Outsourced Multi-User k-Means Clustering

    Get PDF
    Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table

    SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search

    Get PDF
    The kk-Nearest Neighbor Search (kk-NNS) is the backbone of several cloud-based services such as recommender systems, face recognition, and database search on text and images. In these services, the client sends the query to the cloud server and receives the response in which case the query and response are revealed to the service provider. Such data disclosures are unacceptable in several scenarios due to the sensitivity of data and/or privacy laws. In this paper, we introduce SANNS, a system for secure kk-NNS that keeps client's query and the search result confidential. SANNS comprises two protocols: an optimized linear scan and a protocol based on a novel sublinear time clustering-based algorithm. We prove the security of both protocols in the standard semi-honest model. The protocols are built upon several state-of-the-art cryptographic primitives such as lattice-based additively homomorphic encryption, distributed oblivious RAM, and garbled circuits. We provide several contributions to each of these primitives which are applicable to other secure computation tasks. Both of our protocols rely on a new circuit for the approximate top-kk selection from nn numbers that is built from O(n+k2)O(n + k^2) comparators. We have implemented our proposed system and performed extensive experimental results on four datasets in two different computation environments, demonstrating more than 1831×18-31\times faster response time compared to optimally implemented protocols from the prior work. Moreover, SANNS is the first work that scales to the database of 10 million entries, pushing the limit by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202

    A trust-based architecture for managing certificates in vehicular ad hoc networks

    Get PDF
    International audienceIn this paper, we propose a secure and distributed public key infrastructure for VANETs. It is based on an hybrid trust model which is used to determine the trust metric (Tm) of vehicles. It consists on a monitoring system processing on two aspects: the cooperation of vehicles and the legitimacy of the broadcasted data. We propose a fuzzy-based solution in order to decide about the honesty of vehicles. Then, the vehicles which are trusted (Tm = 1), also, they have at least one trusted neighbor can candidate to serve as certification authorities CAs in their clusters. In order to increase the stability of our distributed architecture, the CA candidate which has the lowest relative mobility will be elected as certification authority CA. A set of simulations is conducted. We evaluate particularly the efficiency and the stability of the clustering algorithm as a function of the speed, the average number of vehicles on the platoon and the percentage of trusted vehicles

    Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing

    Get PDF
    In this paper, we propose a privacy preserving distributed clustering protocol for horizontally partitioned data based on a very efficient homomorphic additive secret sharing scheme. The model we use for the protocol is novel in the sense that it utilizes two non-colluding third parties. We provide a brief security analysis of our protocol from information theoretic point of view, which is a stronger security model. We show communication and computation complexity analysis of our protocol along with another protocol previously proposed for the same problem. We also include experimental results for computation and communication overhead of these two protocols. Our protocol not only outperforms the others in execution time and communication overhead on data holders, but also uses a more efficient model for many data mining applications

    Secure Clustering in DSN with Key Predistribution and WCDS

    Get PDF
    This paper proposes an efficient approach of secure clustering in distributed sensor networks. The clusters or groups in the network are formed based on offline rank assignment and predistribution of secret keys. Our approach uses the concept of weakly connected dominating set (WCDS) to reduce the number of cluster-heads in the network. The formation of clusters in the network is secured as the secret keys are distributed and used in an efficient way to resist the inclusion of any hostile entity in the clusters. Along with the description of our approach, we present an analysis and comparison of our approach with other schemes. We also mention the limitations of our approach considering the practical implementation of the sensor networks.Comment: 6 page

    Distributed and Federated Learning Optimization with Federated Clustering of IID-users

    Get PDF
    Federated Learning (FL) is one of the leading learning paradigms for enabling a more significant presence of intelligent applications in networked and Internet of Things (IoT) systems. It consists of individual user devices performing machine learning (ML) models training locally, so that only trained models due to privacy concerns, but not raw data, is transferred through the network for aggregation at the edge or cloud data centers [Li et al. 2019]. Due to the pervasive presence of connected devices such as smart phones and IoT devices in peoples lives, there is a growing concern about how we can preserve and secure users’ information. FL reduces the risk of exposing user information to attackers during transmission over networks or information leakages at the central data centers. Another advantage of FL is scalability and maintainability of intelligent applications in networked and IoT systems. Considering highly distributed environments in which such systems are deployed, collecting and transmitting raw user data for training of ML models at central data centers is a challenging task as it imposes huge workload on the networks and consumes high bandwidth. Training of ML models is distributed over locations and transmitting the trained models for aggregation alleviates these challenges. Among others, distributed and federated learning have applications in smart healthcare systems, where very sensitive user data is involved, and industrial IoT applications, where the amount of data for training may be too large and cumbersome to transport to central data centers. However, FL has the significant shortcoming of requiring user data to be Independent Identically Distributed (IID) (i.e., users which have similar data statistical distributions and are not mutually dependent) and make reliable predictions for a given group of users aggregated into a single model. IID users have similar statistical features, and thus can be aggregated into the same ML models. Since raw data is not available at the model aggregator, it is necessary to find IID users based solely on their trained machine learning models. We present a Neural Network-based Federated Clustering mechanism capable of clustering IID with no access to their raw data called Neural-network SIMilarity estimator, NSIM. Such mechanism performs significantly better than competing techniques for neural-network clustering [Pacheco et al. 2021]. We also present an alternative to the FedAvg aggregation algorithm used in traditional FL, which significantly increases the aggregated models’ reliability in terms of Mean Square Error by creating several training models over IID users in a real-world mobility prediction dataset. We observe improvements of up to 97.52% in terms of Pearson correlation between the similarity estimation by NSIM and ground truth based on the LCSS (Longest Common Sub-Sequence) similarity metric, in comparison with other state-of-the-art approaches. Federated Clustering of IID data in different geographical locations can improve performance of early warning applications such as flood prediction [Samikwa et al. 2020], where the data for some locations may have more statistical similarities. We further present a technique for accelerating ML inference in resource-constrained devices through distributed computation of ML models over IoT networks, while preserving privacy. This has the potential to improve the performance of time sensitive ML applications
    corecore