896 research outputs found

    DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

    Full text link
    Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

    A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce

    Get PDF
    Due to the rapid growth of data from different sources in organizations, the traditional tools and techniques that cannot handle such huge data are known as big data which is in a scalable fashion. Similarly, many existing frequent itemset mining algorithms have good performance but scalability problems as they cannot exploit parallel processing power available locally or in cloud infrastructure. Since big data and cloud ecosystem overcomes the barriers or limitations in computing resources, it is a natural choice to use distributed programming paradigms such as Map Reduce. In this paper, we propose a novel algorithm known as A Nodesets-based Fast and Scalable Frequent Itemset Mining (FSFIM) to extract frequent itemsets from Big Data. Here, Pre-Order Coding (POC) tree is used to represent data and improve speed in processing. Nodeset is the underlying data structure that is efficient in discovering frequent itemsets. FSFIM is found to be faster and more scalable in mining frequent itemsets. When compared with its predecessors such as Node-lists and N-lists, the Nodesets save half of the memory as they need only either pre-order or post-order coding. Cloudera\u27s Distribution of Hadoop (CDH), a MapReduce framework, is used for empirical study. A prototype application is built to evaluate the performance of the FSFIM. Experimental results revealed that FSFIM outperforms existing algorithms such as Mahout PFP, Mlib PFP, and Big FIM. FSFIM is more scalable and found to be an ideal candidate for real-time applications that mine frequent itemsets from Big Data

    A Novel Approach to Extract High Utility Itemsets from Distributed Databases

    Get PDF
    Traditional approaches in data mining focus on support and confidence measures which are just statistics based. Support and confidence measures which are based on the frequency count of the items enable us to derive the frequent itemsets. The frequency of the items as a single factor does not represent the interestingness of the items. To enhance the process of data mining tasks based on the value of the product, several researches were conducted. It resulted in utility mining which is an emerging field of research in data mining. In the recent years various data mining approaches have been implemented in order to find the high utility itemsets. The main objective of utility mining is to identify the itemsets with highest utilities, by considering the subjectively defined utility values, as set by the user. Existing methods based on utility mining concept focus on centralized systems where the data and associated processing is pertained to a particular location. As a further step ahead we try to implement the utility mining concept in a distributed environment. In this approach we use a sophisticated way of mining high utility itemsets using a Fast Utility Mining (FUM) algorithm

    A multithreaded hybrid framework for mining frequent itemsets

    Get PDF
    Mining frequent itemsets is an area of data mining that has beguiled several researchers in recent years. Varied data structures such as Nodesets, DiffNodesets, NegNodesets, N-lists, and Diffsets are among a few that were employed to extract frequent items. However, most of these approaches fell short either in respect of run time or memory. Hybrid frameworks were formulated to repress these issues that encompass the deployment of two or more data structures to facilitate effective mining of frequent itemsets. Such an approach aims to exploit the advantages of either of the data structures while mitigating the problems of relying on either of them alone. However, limited efforts have been made to reinforce the efficiency of such frameworks. To address these issues this paper proposes a novel multithreaded hybrid framework comprising of NegNodesets and N-list structure that uses the multicore feature of today’s processors. While NegNodesets offer a concise representation of itemsets, N-lists rely on List intersection thereby speeding up the mining process. To optimize the extraction of frequent items a hash-based algorithm has been designed here to extract the resultant set of frequent items which further enhances the novelty of the framework
    corecore