3,925 research outputs found

    Temporal data mining for root-cause analysis of machine faults in automotive assembly lines

    Full text link
    Engine assembly is a complex and heavily automated distributed-control process, with large amounts of faults data logged everyday. We describe an application of temporal data mining for analyzing fault logs in an engine assembly plant. Frequent episode discovery framework is a model-free method that can be used to deduce (temporal) correlations among events from the logs in an efficient manner. In addition to being theoretically elegant and computationally efficient, frequent episodes are also easy to interpret in the form actionable recommendations. Incorporation of domain-specific information is critical to successful application of the method for analyzing fault logs in the manufacturing domain. We show how domain-specific knowledge can be incorporated using heuristic rules that act as pre-filters and post-filters to frequent episode discovery. The system described here is currently being used in one of the engine assembly plants of General Motors and is planned for adaptation in other plants. To the best of our knowledge, this paper presents the first real, large-scale application of temporal data mining in the manufacturing domain. We believe that the ideas presented in this paper can help practitioners engineer tools for analysis in other similar or related application domains as well

    Inferring Neuronal Network Connectivity using Time-constrained Episodes

    Full text link
    Discovering frequent episodes in event sequences is an interesting data mining task. In this paper, we argue that this framework is very effective for analyzing multi-neuronal spike train data. Analyzing spike train data is an important problem in neuroscience though there are no data mining approaches reported for this. Motivated by this application, we introduce different temporal constraints on the occurrences of episodes. We present algorithms for discovering frequent episodes under temporal constraints. Through simulations, we show that our method is very effective for analyzing spike train data for unearthing underlying connectivity patterns.Comment: 9 pages. See also http://neural-code.cs.vt.edu

    Utility Mining Across Multi-Dimensional Sequences

    Full text link
    Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data is commonly seen in real life. Sequence data has been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, profit, etc.), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this paper, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract Multi-Dimensional Utility-oriented Sequential useful patterns. Two algorithms respectively named MDUS_EM and MDUS_SD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on five real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.Comment: Under review in IEEE TKDE, 14 page

    Summarizing Event Sequences with Serial Episodes: A Statistical Model and an Application

    Full text link
    In this paper we address the problem of discovering a small set of frequent serial episodes from sequential data so as to adequately characterize or summarize the data. We discuss an algorithm based on the Minimum Description Length (MDL) principle and the algorithm is a slight modification of an earlier method, called CSC-2. We present a novel generative model for sequence data containing prominent pairs of serial episodes and, using this, provide some statistical justification for the algorithm. We believe this is the first instance of such a statistical justification for an MDL based algorithm for summarizing event sequence data. We then present a novel application of this data mining algorithm in text classification. By considering text documents as temporal sequences of words, the data mining algorithm can find a set of characteristic episodes for all the training data as a whole. The words that are part of these characteristic episodes could then be considered the only relevant words for the dictionary thus resulting in a considerably reduced feature vector dimension. We show, through simulation experiments using benchmark data sets, that the discovered frequent episodes can be used to achieve more than four-fold reduction in dictionary size without losing any classification accuracy.Comment: 12 pages. Under review for IEEE TKD

    Relationship-aware sequential pattern mining

    Full text link
    Relationship-aware sequential pattern mining is the problem of mining frequent patterns in sequences in which the events of a sequence are mutually related by one or more concepts from some respective hierarchical taxonomies, based on the type of the events. Additionally events themselves are also described with a certain number of taxonomical concepts. We present RaSP an algorithm that is able to mine relationship-aware patterns over such sequences; RaSP follows a two stage approach. In the first stage it mines for frequent type patterns and {\em all} their occurrences within the different sequences. In the second stage it performs hierarchical mining where for each frequent type pattern and its occurrences it mines for more specific frequent patterns in the lower levels of the taxonomies. We test RaSP on a real world medical application, that provided the inspiration for its development, in which we mine for frequent patterns of medical behavior in the antibiotic treatment of microbes and show that it has a very good computational performance given the complexity of the relationship-aware sequential pattern mining problem

    Fast Utility Mining on Complex Sequences

    Full text link
    High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential patterns (HUSPs). HUSPs can be applied to many real-life applications, such as market basket analysis, E-commerce recommendation, click-stream analysis and scenic route planning. For example, in economics and targeted marketing, understanding economic behavior of consumers is quite challenging, such as finding credible and reliable information on product profitability. Several algorithms have been proposed to address this problem by efficiently mining utility-based useful sequential patterns. Nevertheless, the performance of these algorithms can be unsatisfying in terms of runtime and memory usage due to the combinatorial explosion of the search space for low utility threshold and large databases. Hence, this paper proposes a more efficient algorithm for the task of high-utility sequential pattern mining, called HUSP-ULL. It utilizes a lexicographic sequence (LS)-tree and a utility-linked (UL)-list structure to fast discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper-bounds on the utility of candidate sequences, and reduce the search space by pruning unpromising candidates early. Substantial experiments both on real-life and synthetic datasets show that the proposed algorithm can effectively and efficiently discover the complete set of HUSPs and outperforms the state-of-the-art algorithms.Comment: Under review in IEEE TKDE, 15 page

    ProUM: Projection-based Utility Mining on Sequence Data

    Full text link
    Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they are time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability.Comment: Elsevier Information Science, 17 pages, 4 figure

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Discovering Predictive Event Sequences in Criminal Careers

    Get PDF
    In this work, we consider the problem of predicting criminal behavior, and propose a method for discovering predictive patterns in criminal histories. Quantitative criminal career analysis typically involves clustering individuals according to frequency of a particular event type over time, using cluster membership as a basis for comparison. We demonstrate the effectiveness of hazard pattern mining for the discovery of relationships between different types of events that may occur in criminal careers. Hazard pattern mining is an extension of event sequence mining, with the additional restriction that each event in the pattern is the first subsequent event of the specified type. This restriction facilitates application of established time based measures such as those used in survival analysis. We evaluate hazard patterns using a relative risk model and an accelerated failure time model. The results show that hazard patterns can reliably capture unexpected relationships between events of different types

    Crime Analytics: Mining Event Sequences in Criminal Careers

    Get PDF
    • …
    corecore