62,073 research outputs found
Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems
Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional databases. Apriori is a
classical frequent itemset mining algorithm, which employs iterative passes
over database combining with generation of candidate itemsets based on frequent
itemsets found at the previous iteration, and pruning of clearly infrequent
itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of
Apriori, which tries to reduce the number of passes made over a transactional
database while keeping the number of itemsets counted in a pass relatively low.
In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi
many-core system for the case when the transactional database fits in main
memory. Intel Xeon Phi provides a large number of small compute cores with
vector processing units. The paper presents a parallel implementation of DIC
based on OpenMP technology and thread-level parallelism. We exploit the
bit-based internal layout for transactions and itemsets. This technique reduces
the memory space for storing the transactional database, simplifies the support
count via logical bitwise operation, and allows for vectorization of such a
step. Experimental evaluation on the platforms of the Intel Xeon CPU and the
Intel Xeon Phi coprocessor with large synthetic and real databases showed good
performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information
Technology (http://cit.fer.hr
Image mining: trends and developments
[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining
Discovering unbounded episodes in sequential data
One basic goal in the analysis of time-series data is
to find frequent interesting episodes, i.e, collections
of events occurring frequently together in the input sequence.
Most widely-known work decide the interestingness of an episode from a
fixed user-specified window width or interval, that bounds the
subsequent sequential association rules.
We present in this paper, a more intuitive definition that
allows, in turn, interesting episodes to grow during the mining without any
user-specified help. A convenient algorithm to
efficiently discover the proposed unbounded episodes is also implemented.
Experimental results confirm that our approach results useful
and advantageous.Postprint (published version
- …