6,008 research outputs found

    DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets

    Full text link
    Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure

    Frequent itemset mining: technique to improve eclat based algorithm

    Get PDF
    In frequent itemset mining, the main challenge is to discover relationships between data in a transactional database or relational database. Various algorithms have been introduced to process frequent itemset. Eclat based algorithms are one of the prominent algorithm used for frequent itemset mining. Various researches have been conducted based on Eclat based algorithm such as Tidset, dEclat, Sortdiffset and Postdiffset. The algorithm has been improvised along the time. However, the utilization of physical memory and processing time become the main problem in this process. This paper reviews and presents a comparison of various Eclat based algorithms for frequent itemset mining and propose an enhancement technique of Eclat based algorithm to reduce processing time and memory usage. The experimental result shows some improvement in processing time and memory utilization in frequent itemset mining

    Survey On Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

    Get PDF
    In data mining and knowledge discovery technique domain, frequent pattern mining plays an important role but it does not consider different weight value of the items. Association Rule Mining is to find the correlation between data. The frequent itemsets are patterns or items like itemsets, substructures, or subsequences that come out in a data set frequently or continuously. In this paper we are presenting survey of various frequent pattern mining and weighted itemset mining. Different articles related to frequent and weighted infrequent itemset mining were proposed. This paper focus on survey of various Existing Algorithms related to frequent and infrequent itemset mining which creates a path for future researches in the field of Association Rule Mining

    Memory-Efficient Frequent-Itemset Mining

    Get PDF
    Efficient discovery of frequent itemsets in large datasets is a key component of many data mining tasks. In-core algorithms---which operate entirely in main memory and avoid expensive disk accesses---and in particular the prefix tree-based algorithm FP-growth are generally among the most efficient of the available algorithms. Unfortunately, their excessive memory requirements render them inapplicable for large datasets with many distinct items and/or itemsets of high cardinality. To overcome this limitation, we propose two novel data structures---the CFP-tree and the CFP-array---, which reduce memory consumption by about an order of magnitude. This allows us to process significantly larger datasets in main memory than previously possible. Our data structures are based on structural modifications of the prefix tree that increase compressability, an optimized physical representation, lightweight compression techniques, and intelligent node ordering and indexing. Experiments with both real-world and synthetic datasets show the effectiveness of our approach

    A New Data Layout For Set Intersection on GPUs

    Full text link
    Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, "BatMap", that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.Comment: A version of this paper appears in Proceedings of IPDPS 201
    • …
    corecore