2,485 research outputs found
Evolving temporal association rules with genetic algorithms
A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty
Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster
Designing fast and scalable algorithm for mining frequent itemsets is always
being a most eminent and promising problem of data mining. Apriori is one of
the most broadly used and popular algorithm of frequent itemset mining.
Designing efficient algorithms on MapReduce framework to process and analyze
big datasets is contemporary research nowadays. In this paper, we have focused
on the performance of MapReduce based Apriori on homogeneous as well as on
heterogeneous Hadoop cluster. We have investigated a number of factors that
significantly affects the execution time of MapReduce based Apriori running on
homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both
algorithmic and non-algorithmic improvements. Considered factors specific to
algorithmic improvements are filtered transactions and data structures.
Experimental results show that how an appropriate data structure and filtered
transactions technique drastically reduce the execution time. The
non-algorithmic factors include speculative execution, nodes with poor
performance, data locality & distribution of data blocks, and parallelism
control with input split size. We have applied strategies against these factors
and fine tuned the relevant parameters in our particular application.
Experimental results show that if cluster specific parameters are taken care of
then there is a significant reduction in execution time. Also we have discussed
the issues regarding MapReduce implementation of Apriori which may
significantly influence the performance.Comment: 8 pages, 8 figures, International Conference on Computing,
Communication and Automation (ICCCA2016
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets
Mining frequent itemsets is an essential problem in data mining and plays an
important role in many data mining applications. In recent years, some itemset
representations based on node sets have been proposed, which have shown to be
very efficient for mining frequent itemsets. In this paper, we propose
DiffNodeset, a novel and more efficient itemset representation, for mining
frequent itemsets. Based on the DiffNodeset structure, we present an efficient
algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency,
dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search
strategy and directly enumerates frequent itemsets without candidate generation
under some case. For evaluating the performance of dFIN, we have conduct
extensive experiments to compare it against with existing leading algorithms on
a variety of real and synthetic datasets. The experimental results show that
dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure
- …