61,725 research outputs found

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Data Mining Based on Association Rule Privacy Preserving

    Get PDF
    The security of the large database that contains certain crucial information, it will become a serious issue when sharing data to the network against unauthorized access. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Association analysis is a powerful tool for discovering relationships which are hidden in large database. Association rules hiding algorithms get strong and efficient performance for protecting confidential and crucial data. Data modification and rule hiding is one of the most important approaches for secure data. The objective of the proposed Association rulehiding algorithm for privacy preserving data mining is to hide certain information so that they cannot be discovered through association rule mining algorithm. The main approached of association rule hiding algorithms to hide some generated association rules, by increase or decrease the support or the confidence of the rules. The association rule items whether in Left Hand Side (LHS) or Right Hand Side (RHS) of the generated rule, that cannot be deduced through association rule mining algorithms. The concept of Increase Support of Left Hand Side (ISL) algorithm is decrease the confidence of rule by increase the support value of LHS. It doesnÊt work for both side of rule; it works only for modification of LHS. In Decrease Support of Right Hand Side (DSR) algorithm, confidence of the rule decrease by decrease the support value of RHS. It works for the modification of RHS. We proposed a new algorithm solves the problem of them. That can increase and decrease the support of the LHS and RHS item of the rule correspondingly so that more rule hide less number of modification. The efficiency of the proposed algorithm is compared with ISL algorithms and DSR algorithms using real databases, on the basis of number of rules hide, CPU time and the number of modifies entries and got better results

    Study of Association Rule Mining and Different Hiding Techniques

    Get PDF
    Data mining is the process of extracting hidden patterns from data. As more data is gathered,with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. In this paper, we first focused on APRIORI algorithm, a popular data mining technique and compared the performances of a linked list based implementation as a basis and a tries-based implementation on it for mining frequent item sequences in a transactional database. We examined the data structure, implementation and algorithmic features mainly focusing on those that also arise in frequent item set mining. This algorithm has given us new capabilities to identify associations in large data sets. But a key problem, and still not sufficiently investigated, is the need to balance the confidentiality of the disclosed data with the legitimate needs of the data users. One rule is characterized as sensitive if its disclosure risk is above a certain privacy threshold. Sometimes, sensitive rules should not be disclosed to the public, since among other things, they may be used for inferring sensitive data, or they may provide business competitors with an advantage. So, next we worked with some association rule hiding algorithms and examined their performances in order to analyze their time complexity and the impact that they have in the original database. We worked on two different side effects – one was the number of new rules generated during the hiding process and the other one was the number of non-sensitive rules lost during the process

    An Efficient Rule-Hiding Method for Privacy Preserving in Transactional Databases

    Get PDF
    One of the obstacles in using data mining techniques such as association rules is the risk of leakage of sensitive data after the data is released to the public. Therefore, a trade-off between the data privacy and data mining is of a great importance and must be managed carefully. In this study an efficient algorithm is introduced for preserving the privacy of association rules according to distortion-based method, in which the sensitive association rules are hidden through deletion and reinsertion of items in the database. In this algorithm, in order to reduce the side effects on non-sensitive rules, the item correlation between sensitive and non-sensitive rules is calculated and the item with the minimum influence in non-sensitive rules is selected as the victim item. To reduce the distortion degree on data and preservation of data quality, transactions with highest number of sensitive items are selected for modification. The results show that the proposed algorithm has a better performance in the non-dense real database having less side effects and less data loss compared to its performance in dense real database. Further the results are far better in synthetic databases in compared to real databases

    Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms

    Get PDF
    The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance. This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms

    Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining

    Get PDF
    Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this paper, a novel hiding-missing-artificial utility (HMAU) algorithm is proposed to hide sensitive itemsets through transaction deletion. The transaction with the maximal ratio of sensitive to nonsensitive one is thus selected to be entirely deleted. Three side effects of hiding failures, missing itemsets, and artificial itemsets are considered to evaluate whether the transactions are required to be deleted for hiding sensitive itemsets. Three weights are also assigned as the importance to three factors, which can be set according to the requirement of users. Experiments are then conducted to show the performance of the proposed algorithm in execution time, number of deleted transactions, and number of side effects

    Impacts of frequent itemset hiding algorithms on privacy preserving data mining

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2010Includes bibliographical references (leaves: 54-58)Text in English; Abstract: Turkish and Englishx, 69 leavesThe invincible growing of computer capabilities and collection of large amounts of data in recent years, make data mining a popular analysis tool. Association rules (frequent itemsets), classification and clustering are main methods used in data mining research. The first part of this thesis is implementation and comparison of two frequent itemset mining algorithms that work without candidate itemset generation: Matrix Apriori and FP-Growth. Comparison of these algorithms revealed that Matrix Apriori has higher performance with its faster data structure. One of the great challenges of data mining is finding hidden patterns without violating data owners. privacy. Privacy preserving data mining came into prominence as a solution. In the second study of the thesis, Matrix Apriori algorithm is modified and a frequent itemset hiding framework is developed. Four frequent itemset hiding algorithms are proposed such that: i) all versions work without pre-mining so privacy breech caused by the knowledge obtained by finding frequent itemsets is prevented in advance, ii) efficiency is increased since no pre-mining is required, iii) supports are found during hiding process and at the end sanitized dataset and frequent itemsets of this dataset are given as outputs so no post-mining is required, iv) the heuristics use pattern lengths rather than transaction lengths eliminating the possibility of distorting more valuable data

    An Efficient Approach to Privacy Preserving Association Rule Mining

    Get PDF
    The vulnerabilities associated with large databases is increasing with the passage of time and sharing of data over a network becomes a critical issue for every organization. When we talk about data mining approaches,there has been a tremendous success. But when we see the other side of the coin, it has put the databases and its sensitive information on the verge of being modified or altered by unwanted sources. The major problem is still out in there in the middle and we need to create a balance between the data mining results with the appropriate time management to hide the data. The main focus should be on how we can keep our sensitive data private and the sensitive information could not be revealed through data mining techniques with ease. In this thesis, we focus on hiding the sensitive data with a much faster pace as compared to hiding counter algorithm

    Protecting big data mining association rules using fuzzy system

    Get PDF
    Recently, big data is granted to be the solution to opening the subsequent large fluctuations of increase in fertility. Along with the growth, it is facing some of the challenges. One of the significant problems is data security. While people use data mining methods to identify valuable information following massive database, people further hold the necessary to maintain any knowledge so while not to be worked out, like delicate common itemsets, practices, taxonomy tree and the like Association rule mining can make a possible warning approaching the secrecy of information. So, association rule hiding methods are applied to evade the hazard of delicate information misuse. Various kinds of investigation already prepared on association rule protecting. However, maximum of them concentrate on introducing methods with a limited view outcome for inactive databases (with only existing information), while presently the researchers facing the problem with continuous information. Moreover, in the era of big data, this is essential to optimize current systems to be suited concerning the big data. This paper proposes the framework is achieving the data anonymization by using fuzzy logic by supporting big data mining. The fuzzy logic grouping the sensitivity of the association rules with a suitable association level. Moreover, parallelization methods which are inserted in the present framework will support fast data mining process
    corecore