421 research outputs found

    On Analysis of Mixed Data Classification with Privacy Preservation

    Get PDF
    Privacy-preserving data classification is a pervasive task in privacy-preserving data mining (PPDM). The main goal is to secure the identification of individuals from the released data to prevent privacy breach. However, the goal of classification involves accurate data classification. Thus, the problem is, how to accurately mine large amount of data for extracting relevant knowledge while protecting at the same time sensitive information existing in the database. One of the ways is to anonymize the data set that contains the sensitive information of individuals before getting it released for data analysis. In this paper, we have mainly analyzed the proposed method Microaggregation based Classification Tree (MiCT) which use the properties of decision tree for privacy-preserving classification of mixed data. The evaluations are done based on various privacy models developed keeping in mind the various situations which may arise during data analysis.Keywords:Microaggregation, decision tree, mixed data, data perturbation, classification accuracy, anonymous data

    RANDOMIZATION BASED PRIVACY PRESERVING CATEGORICAL DATA ANALYSIS

    Get PDF
    The success of data mining relies on the availability of high quality data. To ensure quality data mining, effective information sharing between organizations becomes a vital requirement in today’s society. Since data mining often involves sensitive infor- mation of individuals, the public has expressed a deep concern about their privacy. Privacy-preserving data mining is a study of eliminating privacy threats while, at the same time, preserving useful information in the released data for data mining. This dissertation investigates data utility and privacy of randomization-based mod- els in privacy preserving data mining for categorical data. For the analysis of data utility in randomization model, we first investigate the accuracy analysis for associ- ation rule mining in market basket data. Then we propose a general framework to conduct theoretical analysis on how the randomization process affects the accuracy of various measures adopted in categorical data analysis. We also examine data utility when randomization mechanisms are not provided to data miners to achieve better privacy. We investigate how various objective associ- ation measures between two variables may be affected by randomization. We then extend it to multiple variables by examining the feasibility of hierarchical loglinear modeling. Our results provide a reference to data miners about what they can do and what they can not do with certainty upon randomized data directly without the knowledge about the original distribution of data and distortion information. Data privacy and data utility are commonly considered as a pair of conflicting re- quirements in privacy preserving data mining applications. In this dissertation, we investigate privacy issues in randomization models. In particular, we focus on the attribute disclosure under linking attack in data publishing. We propose efficient so- lutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomiza- tion approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss

    Balancing between data utility and privacy preservation in data mining

    Get PDF
    Data Mining plays a vital role in today‟s information world where it has been widely applied in various organizations. The current trend needs to share data for mutual benefit. However, there has been a lot of concern over privacy in the recent years .It has also raised a potential threat of revealing sensitive data of an individual when the data is released publically. Various methods have been proposed to tackle the privacy preservation problem like anonymization and perturbation. But the natural consequence of privacy preservation is information loss. The loss of specific information about certain individuals may affect the data quality and in extreme case the data may become completely useless. There are methods like cryptography which completely anonymize the dataset and which renders the dataset useless. So the utility of the data is completely lost. We need to protect the private information and preserve the data utility as much as possible. So the objective of the thesis is to find an optimum balance between privacy and utility while publishing dataset of any organization. Privacy preservation is hard requirement that must be satisfied and utility is the measure to be optimized. One of the methods for preserving privacy is K-anonymization which also preserves privacy to a good extent. K-anonymity demands that every tuple in the dataset released be indistinguishably related to no fewer than k respondents. We used K-means algorithm for clustering the dataset and followed by k-anonymization. Decision stump classification is used to determine utility and privacy is determined by firing random queries on the anonymized dataset. The balancing point is where the utility and privacy curves intersect or they tend to converge. The balancing point will vary from dataset to dataset and the choice of Quasi-identifier and sensitive attribute. For our experiment the balancing point is found to be around 50-60 percent which is the intersecting point of privacy and utility curves

    A Hybrid Multi-user Cloud Access Control based Block Chain Framework for Privacy Preserving Distributed Databases

    Get PDF
    Most of the traditional medical applications are insecure and difficult to compute the data integrity with variable hash size. Traditional medical data security systems are insecure and it depend on static parameters for data security. Also, distributed based cloud storage systems are independent of integrity computational and data security due to unstructured data and computational memory. As the size of the data and its dimensions are increasing in the public and private cloud servers, it is difficult to provide the machine learning based privacy preserving in cloud computing environment. Block-chain technology plays a vital role for large cloud databases. Most of the conventional block-chain frameworks are based on the existing integrity and confidentiality models. Also, these models are based on the data size and file format. In this model, a novel integrity verification and encryption framework is designed and implemented in cloud environment.  In order to overcome these problems in the cloud computing environment, a hybrid integrity and security-based block-chain framework is designed and implemented on the large distributed databases. In this framework,a novel decision tree classifier is used along with non-linear mathematical hash algorithm and advanced attribute-based encryption models are used to improve the privacy of multiple users on the large cloud datasets. Experimental results proved that the proposed advanced privacy preserving based block-chain technology has better efficiency than the traditional block-chain based privacy preserving systems on large distributed databases

    Performance Evaluation of K-Anonymized Data

    Get PDF
    Data mining provides tools to convert a large amount of knowledge data which is user relevant. But this process could return individual2019;s sensitive information compromising their privacy rights. So, based on different approaches, many privacy protection mechanism incorporated data mining techniques were developed. A widely used micro data protection concept is k-anonymity, proposed to capture the protection of a micro data table regarding re-identification of respondents which the data refers to. In this paper, the effect of the anonymization due to k-anonymity on the data mining classifiers is investigated. NaEF;ve Bayes classifier is used for evaluating the anonymized and non-anonymized data

    SMC PROTOCOL FOR DISTRIBUTED K- ANONYMITY

    Get PDF
    Secure multiparty protocols have been proposed to enable non colluding parties to cooperate without a trusted server. Even though such protocols put off information exposé other than the objective function, they are quite costly in computation and communication. The high overhead motivates parties to estimate the utility that can be achieved as a result of the protocol beforehand. To avoid this issue we propose a look-ahead approach, specifically for secure multiparty protocols to achieve distributed k-anonymity, which helps parties to decide if the utility benefit from the protocol is within an acceptable range before initiating the protocol. The look-aheadoperation is highly localized and its accuracy depends on the amount of information the parties are willing toshare. Experimental results show the effectiveness of the proposed methods
    corecore