12 research outputs found

    Data sanitization in association rule mining based on impact factor

    Get PDF
    Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. This process strongly rely on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. This study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controls the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. The proposed algorithm has been compared with Sliding Window size Algorithm (SWA) and Max-Min1 in term of execution time, data utility and data accuracy. The data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. Experimental results demonstrate that proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWA and Max-Min1 in high scalability for sensitive itemsets and transactions

    A survey of evolutionary computation for association rule mining

    Full text link
    © 2020 Association Rule Mining (ARM) is a significant task for discovering frequent patterns in data mining. It has achieved great success in a plethora of applications such as market basket, computer networks, recommendation systems, and healthcare. In the past few years, evolutionary computation-based ARM has emerged as one of the most popular research areas for addressing the high computation time of traditional ARM. Although numerous papers have been published, there is no comprehensive analysis of existing evolutionary ARM methodologies. In this paper, we review emerging research of evolutionary computation for ARM. We discuss the applications on evolutionary computations for different types of ARM approaches including numerical rules, fuzzy rules, high-utility itemsets, class association rules, and rare association rules. Evolutionary ARM algorithms were classified into four main groups in terms of the evolutionary approach, including evolution-based, swarm intelligence-based, physics-inspired, and hybrid approaches. Furthermore, we discuss the remaining challenges of evolutionary ARM and discuss its applications and future topics

    Evolutionary Machine Learning: A Survey

    Full text link

    Privacy-preserving in association rule mining using an improved discrete binary artificial bee colony

    Full text link
    © 2019 Association Rule Hiding (ARH) is the process of protecting sensitive knowledge using data transformation. Although there are some evolutionary-based ARH algorithms, they mostly focus on the itemset hiding instead of the rule hiding. Besides, unstable convergence to the global optimum solution and designing long solutions make them inappropriate in reducing side effects. They use the basic versions of evolutionary approaches, resulting in inappropriate performance in ARH domain where the search space is large and the algorithms easily get trapped in local optima. To deal with these problems, we propose a new rule hiding algorithm based on a binary Artificial Bee Colony (ABC) approach which has good exploration. However, we improve the binary ABC algorithm to enhance its poor exploitation by designing a new neighborhood generation mechanism to balance between exploration and exploitation. We called this algorithm Improved Binary ABC (IBABC). IBABC approach is coupled with our proposed rule hiding algorithm, called ABC4ARH, to select sensitive transactions for modification. To choose victim items, ABC4ARH formulates a heuristic. The performance of ABC4ARH algorithm on the side effects is demonstrated using extensive experiments conducted on five real datasets. Furthermore, the effectiveness of IBABC is verified using the uncapacitated facility location problem and 0–1 knapsack problem

    A cost-sensitive deep learning based approach for network traffic classification

    Full text link
    Network traffic classification (NTC) plays an important role in cyber security and network performance, for example in intrusion detection and facilitating a higher quality of service. However, due to the unbalanced nature of traffic datasets, NTC can be extremely challenging and poor management can degrade classification performance. While existing NTC methods seek to re-balance data distribution through resampling strategies, such approaches are known to suffer from information loss, overfitting, and increased model complexity. To address these challenges, we propose a new cost-sensitive deep learning approach to increase the robustness of deep learning classifiers against the imbalanced class problem in NTC. First, the dataset is divided into different partitions, and a cost matrix is created for each partition by considering the data distribution. Then, the costs are applied to the cost function layer to penalize classification errors. In our approach, costs are diverse in each type of misclassification because the cost matrix is specifically generated for each partition. To determine its utility, we implement the proposed cost-sensitive learning method in two deep learning classifiers, namely: stacked autoencoder and convolution neural networks. Our experiments on the ISCX VPN-nonVPN dataset show that the proposed approach can obtain higher classification performance on low-frequency classes, in comparison to three other NTC methods

    Statement in Support of: "Virology under the Microscope-a Call for Rational Discourse"

    No full text
    corecore