Search CORE

19,645 research outputs found

Study of Association Rule Mining and Different Hiding Techniques

Author: Bhowmik Debkumar
Saikia Bikramjit
Publication venue
Publication date: 12/05/2009
Field of study

Data mining is the process of extracting hidden patterns from data. As more data is gathered,with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. In this paper, we first focused on APRIORI algorithm, a popular data mining technique and compared the performances of a linked list based implementation as a basis and a tries-based implementation on it for mining frequent item sequences in a transactional database. We examined the data structure, implementation and algorithmic features mainly focusing on those that also arise in frequent item set mining. This algorithm has given us new capabilities to identify associations in large data sets. But a key problem, and still not sufficiently investigated, is the need to balance the confidentiality of the disclosed data with the legitimate needs of the data users. One rule is characterized as sensitive if its disclosure risk is above a certain privacy threshold. Sometimes, sensitive rules should not be disclosed to the public, since among other things, they may be used for inferring sensitive data, or they may provide business competitors with an advantage. So, next we worked with some association rule hiding algorithms and examined their performances in order to analyze their time complexity and the impact that they have in the original database. We worked on two different side effects – one was the number of new rules generated during the hiding process and the other one was the number of non-sensitive rules lost during the process

ethesis@nitr

Introducing an algorithm for use to hide sensitive association rules through perturb technique

Author: M. Naderi Dehkordi
M. Sakenian Dehkordi
Publication venue: 'International Digital Organization for Scientific Information (IDOSI)'
Publication date: 01/07/2016
Field of study

Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the association rules is one of the methods to preserve privacy and it is a main subject in the field of data mining and database security, for which several algorithms with different approaches are presented so far. An algorithm to hide sensitive association rules with a heuristic approach is presented in this article, where the Perturb technique based on reducing confidence or support rules is applied with the attempt to remove the considered item from a transaction with the highest weight by allocating weight to the items and transactions. Efficiency is measured by the failure criteria of hiding, number of lost rules and ghost rules, and execution time. The obtained results of this study are assessed and compared with two known FHSAR and RRLR algorithms, based on two real databases (dense and sparse). The results indicate that the number of lost rules in all experiments are reduced by 47% in comparison with RRLR and reduced by 23% in comparison with FHSAR. Moreover, the other undesirable side effects, in this proposed algorithm in the worst case are equal to that of the base algorithms

Directory of Open Access Journals

Preservation of confidential information privacy and association rule hiding for data mining: a bibliometric review

Author: Cubillos Jenny
Fernández Claudia
Romero Ligia
Silva Jesus
Solano Darwin
Vargas Villa Jesus
Publication venue: Procedia Computer Science
Publication date: 01/01/2019
Field of study

In this era of technology, data of business organizations are growing with acceleration. Mining hidden patterns from this huge database would benefit many industries improving their decision-making processes. Along with the non-sensitive information, these databases also contain some sensitive information about customers. During the mining process, sensitive information about a person can get leaked, resulting in a misuse of the data and causing loss to an individual. The privacy preserving data mining can bring a solution to this problem, helping provide the benefits of mined data along with maintaining the privacy of the sensitive information. Hence, there is a growing interest in the scientific community for developing new approaches to hide the mined sensitive information. In this research, a bibliometric review is carried out during the period 2010 to 2018 to analyze the growth of studies regarding the confidential information privacy preservation through approaches addressed to the hiding of association rules of data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital CUC

Privacy Preserving Utility Mining: A Survey

Author: Chao Han-Chieh
Gan Wensheng
Lin Jerry Chun-Wei
Wang Shyue-Liang
Yu Philip S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2018
Field of study

In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

arXiv.org e-Print Archive

Crossref

Impacts of frequent itemset hiding algorithms on privacy preserving data mining

Author: Yıldız Barış
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2010Includes bibliographical references (leaves: 54-58)Text in English; Abstract: Turkish and Englishx, 69 leavesThe invincible growing of computer capabilities and collection of large amounts of data in recent years, make data mining a popular analysis tool. Association rules (frequent itemsets), classification and clustering are main methods used in data mining research. The first part of this thesis is implementation and comparison of two frequent itemset mining algorithms that work without candidate itemset generation: Matrix Apriori and FP-Growth. Comparison of these algorithms revealed that Matrix Apriori has higher performance with its faster data structure. One of the great challenges of data mining is finding hidden patterns without violating data owners. privacy. Privacy preserving data mining came into prominence as a solution. In the second study of the thesis, Matrix Apriori algorithm is modified and a frequent itemset hiding framework is developed. Four frequent itemset hiding algorithms are proposed such that: i) all versions work without pre-mining so privacy breech caused by the knowledge obtained by finding frequent itemsets is prevented in advance, ii) efficiency is increased since no pre-mining is required, iii) supports are found during hiding process and at the end sanitized dataset and frequent itemsets of this dataset are given as outputs so no post-mining is required, iv) the heuristics use pattern lengths rather than transaction lengths eliminating the possibility of distorting more valuable data