644 research outputs found

    A Novel Approach to Extract High Utility Itemsets from Distributed Databases

    Get PDF
    Traditional approaches in data mining focus on support and confidence measures which are just statistics based. Support and confidence measures which are based on the frequency count of the items enable us to derive the frequent itemsets. The frequency of the items as a single factor does not represent the interestingness of the items. To enhance the process of data mining tasks based on the value of the product, several researches were conducted. It resulted in utility mining which is an emerging field of research in data mining. In the recent years various data mining approaches have been implemented in order to find the high utility itemsets. The main objective of utility mining is to identify the itemsets with highest utilities, by considering the subjectively defined utility values, as set by the user. Existing methods based on utility mining concept focus on centralized systems where the data and associated processing is pertained to a particular location. As a further step ahead we try to implement the utility mining concept in a distributed environment. In this approach we use a sophisticated way of mining high utility itemsets using a Fast Utility Mining (FUM) algorithm

    Mining actionable combined patterns satisfied both utility and frequency criteria

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.In recent years, the importance of identifying actionable patterns has become increasingly recognized so that decision-support actions can be inspired by the resultant patterns. A typical shift is on identifying high utility rather than highly frequent patterns. Accordingly, High Utility ltemset (HUI) Mining methods have become quite popular as well as faster and more reliable than before. However, the current research focus has been on improving the efficiency while the coupling relationships between items are ignored. It is important to study item and itemset couplings inbuilt in the data. For example, the utility of one itemset might be lower than a user-specified threshold, whereas the utility may be larger when an additional itemset takes part in; and vice versa, an item's utility might be high until another one joins in. In this way, although some absolutely high utility itemsets can be discovered, it is sometimes easy to find out that many redundant itemsets sharing the same item are mined (e.g., if the utility of a diamond is high enough, all its supersets are proved to be HUIs). Such itemsets are not actionable, as sellers cannot make higher profit if marketing strategies are created on top of such findings. To this end, this thesis introduces a new framework for mining actionable high utility association rules, called Combined Utility-Association Rules (CUAR), which aims to find high utility and strongly associated itemset combinations which include item/itemset relations. The algorithm is proved to be efficient per experimental outcomes on both real and synthetic datasets

    MINING TOP-K HIGH UTILITY ITEM SETS BY USING EFFICIENT DATA STRUCTURE TO IMPROVE THE PERFORMANCE

    Get PDF
    Association rules show strong relationship between attribute-value pairs (or items) that occur frequently in a given data set. Association rules are commonly used to determine the purchasing patterns of customers in a store. Such analysis is implemented in many decision-making processes, such as product placement, catalogue design, and cross-marketing. The discovery of association rules is based on frequent itemset mining. These frequent itemset mining algorithms mainly suffers from generation of more number of candidate itemsets and large no of database scans. These issues are addressed by two algorithms namely TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in one phase) which are recommended for mining K- high utility itemsets in two scans of the entire database. Though scans are reduced to two, processing time is more because of UP-Tree traversals which is the data structure used by TKU and TKO algorithms.  The proposed algorithm uses B+-Tree data structure instead of UP-Tree to reduce the time. Experimental analysis clearly shows that the processing time is improved and hence limitations of existing work are overcome by proposing a methodology using B+ -Tree

    Efficient chain structure for high-utility sequential pattern mining

    Get PDF
    High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates
    corecore