9 research outputs found

    Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness

    Get PDF
    A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets

    Evolutionary tree-based quasi identifier and federated gradient privacy preservations over big healthcare data

    Get PDF
    Big data has remodeled the way organizations supervise, examine and leverage data in any industry. To safeguard sensitive data from public contraventions, several countries investigated this issue and carried out privacy protection mechanism. With the aid of quasi-identifiers privacy is not said to be preserved to a greater extent. This paper proposes a method called evolutionary tree-based quasi-identifier and federated gradient (ETQI-FD) for privacy preservations over big healthcare data. The first step involved in the ETQI-FD is learning quasi-identifiers. Learning quasi-identifiers by employing information loss function separately for categorical and numerical attributes accomplishes both the largest dissimilarities and partition without a comprehensive exploration between tuples of features or attributes. Next with the learnt quasi-identifiers, privacy preservation of data item is made by applying federated gradient arbitrary privacy preservation learning model. This model attains optimal balance between privacy and accuracy. In the federated gradient privacy preservation learning model, we evaluate the determinant of each attribute to the outputs. Then injecting Adaptive Lorentz noise to data attributes our ETQI-FD significantly minimizes the influence of noise on the final results and therefore contributing to privacy and accuracy. An experimental evaluation of ETQI-FD method achieves better accuracy and privacy than the existing methods

    Improved k-Anonymize and l-Diverse Approach for Privacy Preserving Big Data Publishing Using MPSEC Dataset

    Get PDF
    Data exposure and privacy violations may happen when data is exchanged between organizations. Data anonymization gives promising results for limiting such dangers. In order to maintain privacy, different methods of k-anonymization and l-diversity have been widely used. But for larger datasets, the results are not very promising. The main problem with existing anonymization algorithms is high information loss and high running time. To overcome this problem, this paper proposes new models, namely Improved k-Anonymization (IKA) and Improved l-Diversity (ILD). IKA model takes large k-value using a symmetric as well as an asymmetric anonymizing algorithm. Then IKA is further categorized into Improved Symmetric k-Anonymization (ISKA) and Improved Asymmetric k-Anonymization (IAKA). After anonymizing data using IKA, ILD model is used to increase privacy. ILD will make the data more diverse and thereby increasing privacy. This paper presents the implementation of the proposed IKA and ILD model using real-time big candidate election dataset, which is acquired from the Madhya Pradesh State Election Commission, India (MPSEC) along with Apache Storm. This paper also compares the proposed model with existing algorithms, i.e. Fast clustering-based Anonymization for Data Streams (FADS), Fast Anonymization for Data Stream (FAST), Map Reduce Anonymization (MRA) and Scalable k-Anonymization (SKA). The experimental results show that the proposed models IKA and ILD have remarkable improvement of information loss and significantly enhanced the performance in terms of running time over the existing approaches along with maintaining the privacy-utility trade-off

    Quasi-Identifiers Recognition Algorithm for Privacy Preservation of Cloud Data Based on Risk Re-Identification

    Get PDF
    Cloud computing plays an essential role as a source for outsourcing data to perform mining operations or other data processing, especially for data owners who do not have sufficient resources or experience to execute data mining techniques. However, the privacy of outsourced data is a serious concern. Most data owners are using anonymization-based techniques to prevent identity and attribute disclosures to avoid privacy leakage before outsourced data for mining over the cloud. In addition, data collection and dissemination in a resource-limited network such as sensor cloud require efficient methods to reduce privacy leakage. The main issue that caused identity disclosure is Quasi-Identifiers (QIDs) linking. But most researchers of anonymization methods ignore the identification of proper QIDs. This reduces the validity of the used anonymization methods and may thus lead to a failure of the anonymity process. This paper introduces a new quasi-identifier recognition algorithm that reduces identity disclosure resulted from QIDs linking. The proposed algorithm is comprised of two main stages: (1) Attributes Classification (or QIDs Recognition), and (2) QID's-Dimension Identification. The algorithm works based on the re-identification of risk rate for all attributes and the dimension of QIDs where it determines the proper QIDs and their suitable dimensions. The proposed algorithm was tested on a real dataset. The results demonstrated that the proposed algorithm significantly reduces privacy leakage and maintaining the data utility compared to recent related algorithms

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Adaptable Privacy-preserving Model

    Get PDF
    Current data privacy-preservation models lack the ability to aid data decision makers in processing datasets for publication. The proposed algorithm allows data processors to simply provide a dataset and state their criteria to recommend an xk-anonymity approach. Additionally, the algorithm can be tailored to a preference and gives the precision range and maximum data loss associated with the recommended approach. This dissertation report outlined the research’s goal, what barriers were overcome, and the limitations of the work’s scope. It highlighted the results from each experiment conducted and how it influenced the creation of the end adaptable algorithm. The xk-anonymity model built upon two foundational privacy models, the k-anonymity and l-diversity models. Overall, this study had many takeaways on data and its power in a dataset

    Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness

    No full text
    corecore