800 research outputs found
Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems
Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional databases. Apriori is a
classical frequent itemset mining algorithm, which employs iterative passes
over database combining with generation of candidate itemsets based on frequent
itemsets found at the previous iteration, and pruning of clearly infrequent
itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of
Apriori, which tries to reduce the number of passes made over a transactional
database while keeping the number of itemsets counted in a pass relatively low.
In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi
many-core system for the case when the transactional database fits in main
memory. Intel Xeon Phi provides a large number of small compute cores with
vector processing units. The paper presents a parallel implementation of DIC
based on OpenMP technology and thread-level parallelism. We exploit the
bit-based internal layout for transactions and itemsets. This technique reduces
the memory space for storing the transactional database, simplifies the support
count via logical bitwise operation, and allows for vectorization of such a
step. Experimental evaluation on the platforms of the Intel Xeon CPU and the
Intel Xeon Phi coprocessor with large synthetic and real databases showed good
performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information
Technology (http://cit.fer.hr
Mining Target-Oriented Sequential Patterns with Time-Intervals
A target-oriented sequential pattern is a sequential pattern with a concerned
itemset in the end of pattern. A time-interval sequential pattern is a
sequential pattern with time-intervals between every pair of successive
itemsets. In this paper we present an algorithm to discover target-oriented
sequential pattern with time-intervals. To this end, the original sequences are
reversed so that the last itemsets can be arranged in front of the sequences.
The contrasts between reversed sequences and the concerned itemset are then
used to exclude the irrelevant sequences. Clustering analysis is used with
typical sequential pattern mining algorithm to extract the sequential patterns
with time-intervals between successive itemsets. Finally, the discovered
time-interval sequential patterns are reversed again to the original order for
searching the target patterns.Comment: 11 pages, 9 table
- ā¦