6 research outputs found

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Protecting big data mining association rules using fuzzy system

    Get PDF
    Recently, big data is granted to be the solution to opening the subsequent large fluctuations of increase in fertility. Along with the growth, it is facing some of the challenges. One of the significant problems is data security. While people use data mining methods to identify valuable information following massive database, people further hold the necessary to maintain any knowledge so while not to be worked out, like delicate common itemsets, practices, taxonomy tree and the like Association rule mining can make a possible warning approaching the secrecy of information. So, association rule hiding methods are applied to evade the hazard of delicate information misuse. Various kinds of investigation already prepared on association rule protecting. However, maximum of them concentrate on introducing methods with a limited view outcome for inactive databases (with only existing information), while presently the researchers facing the problem with continuous information. Moreover, in the era of big data, this is essential to optimize current systems to be suited concerning the big data. This paper proposes the framework is achieving the data anonymization by using fuzzy logic by supporting big data mining. The fuzzy logic grouping the sensitivity of the association rules with a suitable association level. Moreover, parallelization methods which are inserted in the present framework will support fast data mining process

    Articulation Point Based Quasi Identifier Detection for Privacy Preserving in Distributed Environment

    Get PDF
    These days, huge data size requires high-end resources to be stored in IT organizations premises. They depend on cloud for additional resource necessities. Since cloud is a third-party, we cannot guarantee high security for our information as it might be misused. This necessitates the need of privacy in data before sharing to the cloud. Numerous specialists proposed several methods, wherein they attempt to discover explicit identifiers and sensitive data before distributing it. But, quasi-identifiers are attributes which can spill data of explicit identifiers utilizing background knowledge. Analysts proposed strategies to find quasi- identifiers with the goal that these properties can likewise be considered for implementing privacy. But, these techniques suffer from many drawbacks like higher time consumption and extract more quasi identifiers which decreases data utility. The proposed work overcomes this drawback by extracting minimum required quasi attributes with minimum time complexity

    Quasi-Identifiers Recognition Algorithm for Privacy Preservation of Cloud Data Based on Risk Re-Identification

    Get PDF
    Cloud computing plays an essential role as a source for outsourcing data to perform mining operations or other data processing, especially for data owners who do not have sufficient resources or experience to execute data mining techniques. However, the privacy of outsourced data is a serious concern. Most data owners are using anonymization-based techniques to prevent identity and attribute disclosures to avoid privacy leakage before outsourced data for mining over the cloud. In addition, data collection and dissemination in a resource-limited network such as sensor cloud require efficient methods to reduce privacy leakage. The main issue that caused identity disclosure is Quasi-Identifiers (QIDs) linking. But most researchers of anonymization methods ignore the identification of proper QIDs. This reduces the validity of the used anonymization methods and may thus lead to a failure of the anonymity process. This paper introduces a new quasi-identifier recognition algorithm that reduces identity disclosure resulted from QIDs linking. The proposed algorithm is comprised of two main stages: (1) Attributes Classification (or QIDs Recognition), and (2) QID's-Dimension Identification. The algorithm works based on the re-identification of risk rate for all attributes and the dimension of QIDs where it determines the proper QIDs and their suitable dimensions. The proposed algorithm was tested on a real dataset. The results demonstrated that the proposed algorithm significantly reduces privacy leakage and maintaining the data utility compared to recent related algorithms

    PRESERVAÇÃO DA PRIVACIDADE NO ACESSO A DADOS POR MEIO DO MODELO K-ANONIMATO

    Get PDF
    O grande desafio para as organizações é garantir a preservação da privacidade ao disponibilizar dados sensíveis, pois corre-se o risco de que seja obtida correlação dos dados privados com base de dados pública, o que pode levar a quebra de confidencialidade. O objetivo deste artigo é demonstrar que existem meios de minimizar problemas relacionados à divulgação de dados sensíveis. Por meio da estrutura de dados disponibilizada no padrão TISS – Troca de Informação em Saúde Suplementar, foi simulada uma base de dados que recebeu generalização e supressão, operações do modelo K-anonimato. Posteriormente foram efetuados ataques, identificando possíveis vulnerabilidades na base de dados, com a finalidade de validar o processo de anonimização. A retirada dos identificadores não é suficiente para atingir o anonimato, pois ao combinar atributos de base de dados privada com públicas é possível à revelação de informações confidenciais, inclusive o atacante pode utilizar-se do conhecimento prévio e correlacionar com os dados disponíveis, principalmente quando a quantidade de semi-identificadores é expressiva na tabela de dados. Com o aumento na coleta e compartilhamento de dados, conjuntamente com a necessidade de acesso, torna-se relevante o estudo e a análise dos aspectos que implicam na disponibilização dos dados e na preservação da privacidad

    A Survey of Privacy Preserving Data Publishing using Generalization and Suppression

    Full text link
    corecore