314 research outputs found

    Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

    Get PDF
    High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.acceptedVersio

    Utilizing Index‑Based Periodic High Utility Mining to Study Frequent Itemsets

    Get PDF
    The potential employability in diferent applications has garnered more signifcance for Periodic High-Utility Itemset Mining (PHUIM). It is to be noted that the conventional utility mining algorithms focus on an itemset’s utility value rather than that of its periodicity in the transaction. A MEAN periodicity measure is added to the minimum (MIN) and maximum (MAX) periodicity to incorporate the periodicity feature into PHUIM in this proposed work. The MEAN-periodicity measure brings a new dimension to the periodicity factor and is arrived at by dividing itemset’s period value by the total number of transactions in that dataset. Further, an algorithm to mine Index-Based Periodic High Utility Itemset Mining (IBPHUIM) from the database using an indexing approach is also proposed in this paper. The proposed IBPHUIM algorithm employs a projectionbased technique and indexing procedure to increase memory and execution speed efciency. The proposed model avoids redundant database scans by generating sub-databases using an indexing data structure. The proposed IBPHUIM model has experimented with test datasets, and the results drawn show that the proposed IBPHUIM model performs considerably better
    • …
    corecore