99,657 research outputs found
Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems
Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional databases. Apriori is a
classical frequent itemset mining algorithm, which employs iterative passes
over database combining with generation of candidate itemsets based on frequent
itemsets found at the previous iteration, and pruning of clearly infrequent
itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of
Apriori, which tries to reduce the number of passes made over a transactional
database while keeping the number of itemsets counted in a pass relatively low.
In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi
many-core system for the case when the transactional database fits in main
memory. Intel Xeon Phi provides a large number of small compute cores with
vector processing units. The paper presents a parallel implementation of DIC
based on OpenMP technology and thread-level parallelism. We exploit the
bit-based internal layout for transactions and itemsets. This technique reduces
the memory space for storing the transactional database, simplifies the support
count via logical bitwise operation, and allows for vectorization of such a
step. Experimental evaluation on the platforms of the Intel Xeon CPU and the
Intel Xeon Phi coprocessor with large synthetic and real databases showed good
performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information
Technology (http://cit.fer.hr
Recommended from our members
A rule dynamics approach to event detection in Twitter with its application to sports and politics
The increasing popularity of Twitter as social network tool for opinion expression as well as informa- tion retrieval has resulted in the need to derive computational means to detect and track relevant top- ics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we ap- ply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events
- ā¦