Parallel Buddy Prima – A Hybrid Parallel Frequent itemset mining algorithm for very large databases

Abstract

Frequent itemset mining is essential for the discovery of association rules, strong rules, episodes, and minimal keys. This paper describes a Parallel approach for association mining, based on Buddy Prima algorithm, that combines bottom up and top down approach. Apriori algorithm, the widely used association mining technique uses the breadth-first search, bottom up approach. The Apriori algorithm performs well only when the frequent itemsets are short. Algorithms with top down approach are suitable for long frequent itemsets. This Parallel Buddy Prima algorithm combines both bottom-up and top-down approach. The PRIMA representation consumes less memory as each transaction is replaced with the product of the equivalent prime numbers of their items. It reduces the time taken to determine the support count of the Itemset. Candidate distribution technique is adopted to handle large datasets with large itemsets. The performance of this algorithm is compared with the other existing algorithms and the results are tabulated. The proposed algorithm reduces the time and data complexity. Experimental results of this algorithm on Microsoft Anonymous Data show that this parallel approach outperforms the existing algorithms approximately by a factor of two. Nomenclature: Parallel data mining, Association mining, top-down approach, Candidate distribution

    Similar works

    Full text

    thumbnail-image

    Available Versions