10,663 research outputs found

    Incrementally updating the high average-utility patterns with pre-large concept

    Get PDF
    High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.publishedVersio

    A fine grained heuristic to capture web navigation patterns

    Get PDF
    In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability

    Incrementally updating the high average-utility patterns with pre-large concept

    Get PDF
    High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.publishedVersio

    Fraud detection in energy consumption: a supervised approach

    Get PDF
    Data from utility meters (gas, electricity, water) is a rich source of information for distribution companies, beyond billing. In this paper we present a supervised technique, which primarily but not only feeds on meter information, to detect meter anomalies and customer fraudulent behavior (meter tampering). Our system detects anomalous meter readings on the basis of models built using machine learning techniques on past data. Unlike most previous work, it can incrementally incorporate the result of field checks to grow the database of fraud and non-fraud patterns, therefore increasing model precision over time and potentially adapting to emerging fraud patterns. The full system has been developed with a company providing electricity and gas and already used to carry out several field checks, with large improvements in fraud detection over the previous checks which used simpler techniques.Peer ReviewedPostprint (author's final draft

    HI-Tree: Mining High Influence Patterns Using External and Internal Utility Values

    Get PDF
    We propose an efficient algorithm, called HI-Tree, for mining high influence patterns for an incremental dataset. In traditional pattern mining, one would find the complete set of patterns and then apply a post-pruning step to it. The size of the complete mining results is typically prohibitively large, despite the fact that only a small percentage of high utility patterns are interesting. Thus it is inefficient to wait for the mining algorithm to complete and then apply feature selection to post-process the large number of resulting patterns. Instead of generating the complete set of frequent patterns we are able to directly mine patterns with high utility values in an incremental manner. In this paper we propose a novel utility measure called an influence factor using the concepts of external utility and internal utility of an item. The influence factor for an item takes into consideration its connectivity with its neighborhood as well as its importance within a transaction. The measure is especially useful in problem domains utilizing network or interaction characteristics amongst items such as in a social network or web click-stream data. We compared our technique against state of the art incremental mining techniques and show that our technique has better rule generation and runtime performance

    A study on incremental mining of frequent patterns

    Get PDF
    Data generated from both the offline and online sources are incremental in nature. Changes in the underlying database occur due to the incremental data. Mining frequent patterns are costly in changing databases, since it requires scanning the database from the start. Thus, mining of growing databases has been a great concern. To mine the growing databases, a new Data Mining technique called Incremental Mining has emerged. The Incremental Mining uses previous mining result to get the desired knowledge by reducing mining costs in terms of time and space. This state of the art paper focuses on Incremental Mining approaches and identifies suitable approaches which are the need of real world problem.Keywords: Data Mining, Frequent Pattern, Incremental Mining, Frequent Pattern Minung, High Utility Mining, Constraint Mining
    • …
    corecore