2,967 research outputs found

    Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing

    Get PDF
    In this paper, we propose a privacy preserving distributed clustering protocol for horizontally partitioned data based on a very efficient homomorphic additive secret sharing scheme. The model we use for the protocol is novel in the sense that it utilizes two non-colluding third parties. We provide a brief security analysis of our protocol from information theoretic point of view, which is a stronger security model. We show communication and computation complexity analysis of our protocol along with another protocol previously proposed for the same problem. We also include experimental results for computation and communication overhead of these two protocols. Our protocol not only outperforms the others in execution time and communication overhead on data holders, but also uses a more efficient model for many data mining applications

    Detecting Inconsistencies in Distributed Data

    Get PDF

    Privacy Preserving Optics Clustering

    Get PDF
    OPTICS is a well-known density-based clustering algorithm which uses DBSCAN theme without producing a clustering of a data set openly, but as a substitute, it creates an augmented ordering of that particular database which represents its density-based clustering structure. This resulted cluster-ordering comprises information which is similar to the density based clustering’s conforming to a wide range of parameter settings. The same algorithm can be applied in the field of privacy-preserving data mining, where extracting the useful information from data which is distributed over a network requires preservation of privacy of individuals’ information. The problem of getting the clusters of a distributed database is considered as an example of this algorithm, where two parties want to know their cluster numbers on combined database without revealing one party information to other party. This issue can be seen as a particular example of secure multi-party computation and such sort of issues can be solved with the assistance of proposed protocols in our work along with some standard protocols

    Survey on Secure Mining of Association Rules in Vertically Distributed Databases

    Get PDF
    A distributed database system is a collection of sites connected on a common high bandwidth network. Logically, data belongs to the same system but physically it is spread over the sites of the network, making the distribution invisible to the user. The advantage of this distribution resides in achieving availability, performance, modularity and reliability. In this paper, I have done a survey of papers related to Mining of Association Rules over distributed databases. From this survey, we have come up with a proposed solution to address the problem of secure mining of association rules where transactions are distributed in vertically distributed databases. Each site holds some attributes of each transaction and the sites wish to participate in the identification of globally valid association rules However, the sites should not reveal individual transaction data. The Protocol is based on Apriori Algorithm [2] and MultiParty Algorithm [3] for efficiently discovering frequent item sets with minimum support levels, without either site communicating individual transaction values. DOI: 10.17762/ijritcc2321-8169.15035

    Interestingness measure on privacy preserved data with horizontal partitioning

    Get PDF
    Association rule mining is a process of finding the frequent item sets based on the interestingness measure. The major challenge exists when performing the association of the data where privacy preservation is emphasized. The actual transaction data provides the evident to calculate the parameters for defining the association rules. In this paper, a solution is proposed to find one such parameter i.e. support count for item sets on the non transparent data, in other words the transaction data is not disclosed. The privacy preservation is ensured by transferring the x-anonymous records for every transaction record. All the anonymous set of actual transaction record perceives high generalized values. The clients process the anonymous set of every transaction record to arrive at high abstract values and these generalized values are used for support calculation. More the number of anonymous records, more the privacy of data is amplified. In experimental results it is shown that privacy is ensured with more number of formatted transactions

    Distributed Correlation-Based Feature Selection in Spark

    Get PDF
    CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure
    corecore