430 research outputs found

    Privacy Preserving K-means Clustering with Chaotic Distortion

    Get PDF
    Randomized data distortion is a popular method used to mask the data for preserving the privacy. But the appropriateness of this method was questioned because of its possibility of disclosing original data. In this paper, the chaos system, with its unique characteristics of sensitivity on initial condition and unpredictability, is advocated to distort the original data with sensitive information for privacy preserving k-means clustering. The chaotic distortion procedure is proposed and three performance metrics specifically for k-means clustering are developed. We use a large scale experiment (with 4 real world data sets and corresponding reproduced 40 data sets) to evaluate its performance. Our study shows that the proposed approach is effective; it not only can protect individual privacy but also maintain original information of cluster cente

    Secure Algorithm for File Sharing Using Clustering Technique of K-Means Clustering

    Get PDF
    In the current scenario The Security is most or of at most importance when we are talking about file transferring in networks. In the thesis, the work has design a new innovative algorithm to securely transfer the data over network. The k ā€“means clustering algorithm, introduced by MacQueen in 1967 is a broadly utilized plan to solve the clustering problem. It classifies a given arrangement of n-information focuses in m-dimensional space into k-clusters whose focuses are gotten by the centroids. The issue with the privacy consideration has been examined, and that is the data is distributed among various gatherings and the disseminated information is to be safeguarded. In this thesis, created chucks or parts of file using the K-Means Clustering Algorithm and the individual part is encrypted using the key which is shared between sender and receiver. Further, the bunched records have been encoded by utilizing AES encryption algorithm with the introduction of private key concept covertly shared between the involved parties which gives a superior security state

    A New Approach of Detecting Network Anomalies using Improved ID3 with Horizontal Partioning Based Decision Tree

    Get PDF
    In this paper we are proposing a new approach of Detecting Network Anomalies using improved ID3 with horizontal portioning based decision tree. Here we first apply different clustering algorithms and after that we apply horizontal partioning decision tree and then check the network anomalies from the decision tree. Here in this paper we find the comparative analysis of different clustering algorithms and existing id3 based decision tree

    Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing

    Full text link
    With the onset of the Information Era and the rapid growth of information technology, ample space for processing and extracting data has opened up. However, privacy concerns may stifle expansion throughout this area. The challenge of reliable mining techniques when transactions disperse across sources is addressed in this study. This work looks at the prospect of creating a new set of three algorithms that can obtain maximum privacy, data utility, and time savings while doing so. This paper proposes a unique double encryption and Transaction Splitter approach to alter the database to optimize the data utility and confidentiality tradeoff in the preparation phase. This paper presents a customized apriori approach for the mining process, which does not examine the entire database to estimate the support for each attribute. Existing distributed data solutions have a high encryption complexity and an insufficient specification of many participants' properties. Proposed solutions provide increased privacy protection against a variety of attack models. Furthermore, in terms of communication cycles and processing complexity, it is much simpler and quicker. Proposed work tests on top of a realworld transaction database demonstrate that the aim of the proposed method is realistic

    Semantic preserving text tepresentation and its applications in text clustering

    Get PDF
    Text mining using the vector space representation has proven to be an valuable tool for classification, prediction, information retrieval and extraction. The nature of text data presents several issues to these tasks, including large dimension and the existence of special polysemous and synonymous words. A variety of techniques have been devised to overcome these shortcomings, including feature selection and word sense disambiguation. Privacy preserving data mining is also an area of emerging interest. Existing techniques for privacy preserving data mining require the use of secure computation protocols, which often incur a greatly increased computational cost. In this paper, a generalization-based method is presented for creating a semantic-preserving vector space which reduces dimension as well as addresses problems with special word types. The SPVSM also allows private text data to be safely represented without degrading cluster accuracy or performance. Further, the result produced is also usable in combination with theoretic based techniques such as latent semantic indexing. The performance of text clustering using the semantic preserving generalization method is evaluated and compared to existing feature selection techniques, and shown to have significant merit from a clustering perspective

    Spectral anonymization of data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 87-96).Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.by Thomas Anton Lasko.Ph.D

    Heterogeneous differential privacy for vertically partitioned databases

    Full text link
    Ā© 2019 John Wiley & Sons, Ltd. Existing privacy-preserving approaches are generally designed to provide privacy guarantee for individual data in a database, which reduces the utility of the database for data analysis. In this paper, we propose a novel differential privacy mechanism to preserve the heterogeneous privacy of a vertically partitioned database based on attributes. We first present the concept of privacy label, which characterizes the privacy information of the database and is instantiated by the classification. Then, we use an information-based method to systematically explore the dependencies between all attributes and the privacy label. We finally assign privacy weights to every attribute and design a heterogeneous mechanism according to the basic Laplace mechanism. Evaluations using real datasets demonstrate that the proposed mechanism achieves a balanced privacy and utility
    • ā€¦
    corecore