1,836 research outputs found
Efficient Analysis of Pattern and Association Rule Mining Approaches
The process of data mining produces various patterns from a given data
source. The most recognized data mining tasks are the process of discovering
frequent itemsets, frequent sequential patterns, frequent sequential rules and
frequent association rules. Numerous efficient algorithms have been proposed to
do the above processes. Frequent pattern mining has been a focused topic in
data mining research with a good number of references in literature and for
that reason an important progress has been made, varying from performant
algorithms for frequent itemset mining in transaction databases to complex
algorithms, such as sequential pattern mining, structured pattern mining,
correlation mining. Association Rule mining (ARM) is one of the utmost current
data mining techniques designed to group objects together from large databases
aiming to extract the interesting correlation and relation among huge amount of
data. In this article, we provide a brief review and analysis of the current
status of frequent pattern mining and discuss some promising research
directions. Additionally, this paper includes a comparative study between the
performance of the described approaches.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1312.4800; and with arXiv:1109.2427 by other author
An efficient closed frequent itemset miner for the MOA stream mining system
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version
An efficient parallel method for mining frequent closed sequential patterns
Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739
- …