16 research outputs found

    DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

    Full text link
    Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

    A multithreaded hybrid framework for mining frequent itemsets

    Get PDF
    Mining frequent itemsets is an area of data mining that has beguiled several researchers in recent years. Varied data structures such as Nodesets, DiffNodesets, NegNodesets, N-lists, and Diffsets are among a few that were employed to extract frequent items. However, most of these approaches fell short either in respect of run time or memory. Hybrid frameworks were formulated to repress these issues that encompass the deployment of two or more data structures to facilitate effective mining of frequent itemsets. Such an approach aims to exploit the advantages of either of the data structures while mitigating the problems of relying on either of them alone. However, limited efforts have been made to reinforce the efficiency of such frameworks. To address these issues this paper proposes a novel multithreaded hybrid framework comprising of NegNodesets and N-list structure that uses the multicore feature of today’s processors. While NegNodesets offer a concise representation of itemsets, N-lists rely on List intersection thereby speeding up the mining process. To optimize the extraction of frequent items a hash-based algorithm has been designed here to extract the resultant set of frequent items which further enhances the novelty of the framework

    IMPLEMENTATION OF DYNAMIC AND FAST MINING ALGORITHMS ON INCREMENTAL DATASETS TO DISCOVER QUALITATIVE RULES

    Get PDF
    Association Rule Mining is an important field in knowledge mining that allows the rules of association needed for decision making. Frequent mining of objects presents a difficulty to huge datasets. As the dataset gets bigger and more time and burden to uncover the rules. In this paper, overhead and time-consuming overhead reduction techniques with an IPOC (Incremental Pre-ordered code) tree structure were examined. For the frequent usage of database mining items, those techniques require highly qualified data structures. FIN (Frequent itemset-Nodeset) employs a node-set, a unique and new data structure to extract frequently used Items and an IPOC tree to store frequent data progressively. Different methods have been modified to analyze and assess time and memory use in different data sets. The strategies suggested and executed shows increased performance when producing rules, using time and efficiency

    A genetic algorithm coupled with tree-based pruning for mining closed association rules

    Get PDF
    Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory

    Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

    Get PDF
    Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users\u27 queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy

    Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

    Get PDF
    Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users' queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy.publishedVersio

    An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases

    Get PDF
    In this paper we propose very efficient itemset representation for frequent itemset mining from transactional databases. The combinatorial number system is used to uniquely represent frequent k-itemset with just one integer value, for any k ≥ 2. Experiments show that memory requirements can be reduced up to 300 %, especially for very low minimal support thresholds. Further, we exploit combinatorial number schema for representing candidate itemsets during iterative join-based approach. The novel algorithm maintains one-dimensional array rank, starting from k = 2nd iteration. At the index r of the array, the proposed algorithm stores unique integer representation of the r-th candidate in lexicographic order. The rank array provides joining of two candidate k-itemsets to be O(1) instead of O(k) operation. Additionally, the rank array provides faster determination which candidates are contained in the given transaction during the support count and test phase. Finally, we believe that itemset ranking by combinatorial number system can be effectively integrated into pattern-growth algorithms, that are state-of-the-art in frequent itemset mining, and additionally improve their performances
    corecore