2,210 research outputs found

    Prefix-Projection Global Constraint for Sequential Pattern Mining

    Full text link
    Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets

    Efficient Incremental Breadth-Depth XML Event Mining

    Full text link
    Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules

    Self-configuring data mining for ubiquitous computing

    Get PDF
    Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed

    Knowledge discovery in data streams

    Full text link
    Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resources are abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works

    Robust Complex Event Pattern Detection over Streams

    Get PDF
    Event stream processing (ESP) has become increasingly important in modern applications. In this dissertation, I focus on providing a robust ESP solution by meeting three major research challenges regarding the robustness of ESP systems: (1) while event constraint of the input stream is available, applying such semantic information in the event processing; (2) handling event streams with out-of-order data arrival and (3) handling event streams with interval-based temporal semantics. The following are the three corresponding research tasks completed by the dissertation: Task I - Constraint-Aware Complex Event Pattern Detection over Streams. In this task, a framework for constraint-aware pattern detection over event streams is designed, which on the fly checks the query satisfiability / unsatisfiability using a lightweight reasoning mechanism and adjusts the processing strategy dynamically by producing early feedback, releasing unnecessary system resources and terminating corresponding pattern monitor. Task II - Complex Event Pattern Detection over Streams with Out-of-Order Data Arrival. In this task, a mechanism to address the problem of processing event queries specified over streams that may contain out-of-order data is studied, which provides new physical implementation strategies for the core stream algebra operators such as sequence scan, pattern construction and negation filtering. Task III - Complex Event Pattern Detection over Streams with Interval-Based Temporal Semantics. In this task, an expressive language to represent the required temporal patterns among streaming interval events is introduced and the corresponding temporal operator ISEQ is designed

    Literature Review on Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases

    Get PDF
    This paper presenting a survey on finding itemsets with high utility. For finding itemsets there are many algorithms but those algorithms having a problem of producing a large number of candidate itemsets for high utility itemsets which reduces mining performance in terms of execution. Here we mainly focus on two algorithms utility pattern growth (UP-Growth) and UP-Growth+. Those algorithms are used for mining high utility itemsets, where effective methods are used for pruning candidate itemsets. Mining high utility itemsets Keep in a special data structure called UP-Tree. This, compact tree structure, UP-Tree, is used for make possible the mining performance and avoid scanning original database repeatedly. In this for generation of candidate itemsets only two scans of database. Another proposed algorithms UP Growth+ reduces the number of candidates effectively. It also has better performance than other algorithms in terms of runtime, especially when databases contain huge amount of long transactions. Utility-based data mining is a new research area which is interested in all types of utility factors in data mining processes. In which utility factors are targeted at integrate utility considerations in both predictive and descriptive data mining tasks. High utility itemset mining is a research area of utility based descriptive data mining. Utility based data mining is used for finding itemsets that contribute most to the total utility in that database

    Loom: Query-aware Partitioning of Online Graphs

    Full text link
    As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG
    corecore