71,924 research outputs found

    Pattern mining under different conditions

    Get PDF
    New requirements and demands on pattern mining arise in modern applications, which cannot be fulfilled using conventional methods. For example, in scientific research, scientists are more interested in unknown knowledge, which usually hides in significant but not frequent patterns. However, existing itemset mining algorithms are designed for very frequent patterns. Furthermore, scientists need to repeat an experiment many times to ensure reproducibility. A series of datasets are generated at once, waiting for clustering, which can contain an unknown number of clusters with various densities and shapes. Using existing clustering algorithms is time-consuming because parameter tuning is necessary for each dataset. Many scientific datasets are extremely noisy. They contain considerably more noises than in-cluster data points. Most existing clustering algorithms can only handle noises up to a moderate level. Temporal pattern mining is also important in scientific research. Existing temporal pattern mining algorithms only consider pointbased events. However, most activities in the real-world are interval-based with a starting and an ending timestamp. This thesis developed novel pattern mining algorithms for various data mining tasks under different conditions. The first part of this thesis investigates the problem of mining less frequent itemsets in transactional datasets. In contrast to existing frequent itemset mining algorithms, this part focus on itemsets that occurred not that frequent. Algorithms NIIMiner, RaCloMiner, and LSCMiner are proposed to identify such kind of itemsets efficiently. NIIMiner utilizes the negative itemset tree to extract all patterns that occurred less than a given support threshold in a top-down depth-first manner. RaCloMiner combines existing bottom-up frequent itemset mining algorithms with a top-down itemset mining algorithm to achieve a better performance in mining less frequent patterns. LSCMiner investigates the problem of mining less frequent closed patterns. The second part of this thesis studied the problem of interval-based temporal pattern mining in the stream environment. Interval-based temporal patterns are sequential patterns in which each event is aligned with a starting and ending temporal information. The ability to handle interval-based events and stream data is lacking in existing approaches. A novel intervalbased temporal pattern mining algorithm for stream data is described in this part. The last part of this thesis studies new problems in clustering on numeric datasets. The first problem tackled in this part is shape alternation adaptivity in clustering. In applications such as scientific data analysis, scientists need to deal with a series of datasets generated from one experiment. Cluster sizes and shapes are different in those datasets. A kNN density-based clustering algorithm, kadaClus, is proposed to provide the shape alternation adaptability so that users do not need to tune parameters for each dataset. The second problem studied in this part is clustering in an extremely noisy dataset. Many real-world datasets contain considerably more noises than in-cluster data points. A novel clustering algorithm, kenClus, is proposed to identify clusters in arbitrary shapes from extremely noisy datasets. Both clustering algorithms are kNN-based, which only require one parameter k. In each part, the efficiency and effectiveness of the presented techniques are thoroughly analyzed. Intensive experiments on synthetic and real-world datasets are conducted to show the benefits of the proposed algorithms over conventional approaches

    Crowd Modeling using Temporal Association Rules

    Get PDF
    Understanding crowd behavior has attracted tremendous attention from researchers over the years. In this work, we propose an unsupervised approach for crowd scene modeling and anomaly detection using association rules mining. Using object tracklets, we identify events occurring in the scene, demonstrated by the paths or routes objects take while traversing the scene. Allen\u27s interval-based temporal logic is used to extract frequent temporal patterns from the scene. Temporal association rules are generated from these frequent temporal patterns. Our goal is to understand the scene grammar, which is encoded in both the spatial and spatio-temporal patterns. We perform anomaly detection and test the method on a well-known public data

    FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences

    Full text link
    We study the problem of classifying interval-based temporal sequences (IBTSs). Since common classification algorithms cannot be directly applied to IBTSs, the main challenge is to define a set of features that effectively represents the data such that classifiers can be applied. Most prior work utilizes frequent pattern mining to define a feature set based on discovered patterns. However, frequent pattern mining is computationally expensive and often discovers many irrelevant patterns. To address this shortcoming, we propose the FIBS framework for classifying IBTSs. FIBS extracts features relevant to classification from IBTSs based on relative frequency and temporal relations. To avoid selecting irrelevant features, a filter-based selection strategy is incorporated into FIBS. Our empirical evaluation on eight real-world datasets demonstrates the effectiveness of our methods in practice. The results provide evidence that FIBS effectively represents IBTSs for classification algorithms, which contributes to similar or significantly better accuracy compared to state-of-the-art competitors. It also suggests that the feature selection strategy is beneficial to FIBS's performance.Comment: In: Big Data Analytics and Knowledge Discovery. DaWaK 2020. Springer, Cha

    Periodic pattern mining from spatio-temporal trajectory data

    Get PDF
    Rapid development in GPS tracking techniques produces a large number of spatio-temporal trajectory data. The analysis of these data provides us with a new opportunity to discover useful behavioural patterns. Spatio-temporal periodic pattern mining is employed to find temporal regularities for interesting places. Mining periodic patterns from spatio-temporal trajectories can reveal useful, important and valuable information about people's regular and recurrent movements and behaviours. Previous studies have been proposed to extract people's regular and repeating movement behavior from spatio-temporal trajectories. These previous approaches can target three following issues, (1) long individual trajectory; (2) spatial fuzziness; and (3) temporal fuzziness. First, periodic pattern mining is different to other pattern mining, such as association rule ming and sequential pattern mining, periodic pattern mining requires a very long trajectory from an individual so that the regular period can be extracted from this long single trajectory, for example, one month or one year period. Second, spatial fuzziness shows although a moving object can regularly move along the similar route, it is impossible for it to appear at the exactly same location. For instance, Bob goes to work everyday, and although he can follow a similar path from home to his workplace, the same location cannot be repeated across different days. Third, temporal fuzziness shows that periodicity is complicated including partial time span and multiple interleaving periods. In reality, the period is partial, it is highly impossible to occur through the whole movement of the object. Alternatively, the moving object has only a few periods, such as a daily period for work, or yearly period for holidays. However, it is insufficient to find effective periodic patterns considering these three issues only. This thesis aims to develop a new framework to extract more effective, understandable and meaningful periodic patterns by taking more features of spatio-temporal trajectories into account. The first feature is trajectory sequence, GPS trajectory data is temporally ordered sequences of geolocation which can be represented as consecutive trajectory segments, where each entry in each trajectory segment is closely related to the previous sampled point (trajectory node) and the latter one, rather than being isolated. Existing approaches disregard the important sequential nature of trajectory. Furthermore, they introduce both unwanted false positive reference spots and false negative reference spots. The second feature is spatial and temporal aspects. GPS trajectory data can be presented as triple data (x; y; t), x and y represent longitude and latitude respectively whilst t shows corresponding time in this location. Obviously, spatial and temporal aspects are two key factors. Existing methods do not consider these two aspects together in periodic pattern mining. Irregular time interval is the third feature of spatio-temporal trajectory. In reality, due to weather conditions, device malfunctions, or battery issues, the trajectory data are not always regularly sampled. Existing algorithms cannot deal with this issue but instead require a computationally expensive trajectory interpolation process, or it is assumed that trajectory is with regular time interval. The fourth feature is hierarchy of space. Hierarchy is an inherent property of spatial data that can be expressed in different levels, such as a country includes many states, a shopping mall is comprised of many shops. Hierarchy of space can find more hidden and valuable periodic patterns. Existing studies do not consider this inherent property of trajectory. Hidden background semantic information is the final feature. Aspatial semantic information is one of important features in spatio-temporal data, and it is embedded into the trajectory data. If the background semantic information is considered, more meaningful, understandable and useful periodic patterns can be extracted. However, existing methods do not consider the geographical information underlying trajectories. In addition, at times we are interested in finding periodic patterns among trajectory paths rather than trajectory nodes for different applications. This means periodic patterns should be identified and detected against trajectory paths rather than trajectory nodes for some applications. Existing approaches for periodic pattern mining focus on trajectories nodes rather than paths. To sum up, the aim of this thesis is to investigate solutions to these problems in periodic pattern mining in order to extract more meaningful, understandable periodic patterns. Each of three chapters addresses a different problem and then proposes adequate solutions to problems currently not addressed in existing studies. Finally, this thesis proposes a new framework to address all problems. First, we investigated a path-based solution which can target trajectory sequence and spatio-temporal aspects. We proposed an algorithm called Traclus (spatio-temporal) which can take spatial and temporal aspects into account at the same time instead of only considering spatial aspect. The result indicated our method produced more effective periodic patterns based on trajectory paths than existing node-based methods using two real-world trajectories. In order to consider hierarchy of space, we investigated existing hierarchical clustering approaches to obtain hierarchical reference spots (trajectory paths) for periodic pattern mining. HDBSCAN is an incremental version of DBSCAN which is able to handle clusters with different densities to generate a hierarchical clustering result using the single-linkage method, and then it automatically extracts clusters from a hierarchical tree. Thus, we modified traditional clustering method DBSCAN in Traclus (spatio-temporal) to HDBSCAN for extraction of hierarchical reference spots. The result is convincing, and reveals more periodic patterns than those of existing methods. Second, we introduced a stop/move method to annotate each spatio-temporal entry with a semantic label, such as restaurant, university and hospital. This method can enrich a trajectory with background semantic information so that we can easily infer people's repeating behaviors. In addition, existing methods use interpolation to make trajectory regular and then apply Fourier transform and autocorrelation to automatically detect period for each reference spot. An increasing number of trajectory nodes leads to an exponential increase of running time. Thus, we employed Lomb-Scargle periodogram to detect period for each reference spot based on raw trajectory without requiring any interpolation method. The results showed our method outperformed existing approaches on effectiveness and efficiency based on two real datasets. For hierarchical aspect, we extended previous work to find hierarchical semantic periodic patterns by applying HDBSCAN. The results were promising. Third, we apply our methodology to a case study, which reveals many interesting medical periodic patterns. These patterns can effectively explore human movement behaviors for positive medical outcomes. To sum up, this research proposed a new framework to gradually target the problems that existing methods cannot handle. These include: how to consider trajectory sequence, how to consider spatial temporal aspects together, how to deal with trajectory with irregular time interval, how to consider hierarchy of space and how to extract semantic information behind trajectory. After addressing all these problems, the experimental results demonstrate that our method can find more understandable, meaningful and effective periodic patterns than existing approaches

    Discovering temporal patterns for interval-based events.

    Get PDF
    Kam, Po-shan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 89-97).Abstracts in English and Chinese.Abstract --- p.iAcknowledgements --- p.iiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Data Mining --- p.1Chapter 1.2 --- Temporal Data Management --- p.2Chapter 1.3 --- Temporal reasoning and temporal semantics --- p.3Chapter 1.4 --- Temporal Data Mining --- p.5Chapter 1.5 --- Motivation --- p.6Chapter 1.6 --- Approach --- p.7Chapter 1.6.1 --- Focus and Objectives --- p.8Chapter 1.6.2 --- Experimental Setup --- p.8Chapter 1.7 --- Outline and contributions --- p.9Chapter 2 --- Relevant Work --- p.10Chapter 2.1 --- Data Mining --- p.10Chapter 2.1.1 --- Association Rules --- p.13Chapter 2.1.2 --- Classification --- p.15Chapter 2.1.3 --- Clustering --- p.16Chapter 2.2 --- Sequential Pattern --- p.17Chapter 2.2.1 --- Frequent Patterns --- p.18Chapter 2.2.2 --- Interesting Patterns --- p.20Chapter 2.2.3 --- Granularity --- p.21Chapter 2.3 --- Temporal Database --- p.21Chapter 2.4 --- Temporal Reasoning --- p.23Chapter 2.4.1 --- Natural Language Expression --- p.24Chapter 2.4.2 --- Temporal Logic Approach --- p.25Chapter 2.5 --- Temporal Data Mining --- p.25Chapter 2.5.1 --- Framework --- p.25Chapter 2.5.2 --- Temporal Association Rules --- p.26Chapter 2.5.3 --- Attribute-Oriented Induction --- p.27Chapter 2.5.4 --- Time Series Analysis --- p.27Chapter 3 --- Discovering Temporal Patterns for interval-based events --- p.29Chapter 3.1 --- Temporal Database --- p.29Chapter 3.2 --- Allen's Taxonomy of Temporal Relationships --- p.31Chapter 3.3 --- "Mining Temporal Pattern, AppSeq and LinkSeq" --- p.33Chapter 3.3.1 --- A1 and A2 temporal pattern --- p.33Chapter 3.3.2 --- "Second Temporal Pattern, LinkSeq" --- p.34Chapter 3.4 --- Overview of the Framework --- p.35Chapter 3.4.1 --- "Mining Temporal Pattern I, AppSeq" --- p.36Chapter 3.4.2 --- "Mining Temporal Pattern II, LinkSeq" --- p.36Chapter 3.5 --- Summary --- p.37Chapter 4 --- "Mining Temporal Pattern I, AppSeq" --- p.38Chapter 4.1 --- Problem Statement --- p.38Chapter 4.2 --- Mining A1 Temporal Patterns --- p.40Chapter 4.2.1 --- Candidate Generation --- p.43Chapter 4.2.2 --- Large k-Items Generation --- p.46Chapter 4.3 --- Mining A2 Temporal Patterns --- p.48Chapter 4.3.1 --- Candidate Generation: --- p.49Chapter 4.3.2 --- Generating Large 2k-Items: --- p.51Chapter 4.4 --- Modified AppOne and AppTwo --- p.51Chapter 4.5 --- Performance Study --- p.53Chapter 4.5.1 --- Experimental Setup --- p.53Chapter 4.5.2 --- Experimental Results --- p.54Chapter 4.5.3 --- Medical Data --- p.58Chapter 4.6 --- Summary --- p.60Chapter 5 --- "Mining Temporal Pattern II, LinkSeq" --- p.62Chapter 5.1 --- Problem Statement --- p.62Chapter 5.2 --- "First Method for Mining LinkSeq, LinkApp" --- p.63Chapter 5.3 --- "Second Method for Mining LinkSeq, LinkTwo" --- p.64Chapter 5.4 --- "Alternative Method for Mining LinkSeq, LinkTree" --- p.65Chapter 5.4.1 --- Sequence Tree: Design --- p.65Chapter 5.4.2 --- Construction of seq-tree --- p.69Chapter 5.4.3 --- Mining LinkSeq using seq-tree --- p.76Chapter 5.5 --- Performance Study --- p.82Chapter 5.6 --- Discussions --- p.85Chapter 5.7 --- Summary --- p.85Chapter 6 --- Conclusion and Future Work --- p.87Chapter 6.1 --- Conclusion --- p.87Chapter 6.2 --- Future Work --- p.88Bibliography --- p.9

    Mining sensor datasets with spatiotemporal neighborhoods

    Get PDF
    Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within each temporal interval. These methods were tested on real-world datasets including (a) sea surface temperature data from the Tropical Atmospheric Ocean Project (TAO) array in the Equatorial Pacific Ocean and (b) NEXRAD precipitation data from the Hydro-NEXRAD system. The results were evaluated based on known patterns of the phenomenon being measured. Furthermore the results were quantified by performing hypothesis testing to establish the statistical significance using Monte Carlo simulations. The approach was also compared with existing approaches using validation metrics namely spatial autocorrelation and temporal interval dissimilarity. The results of these experiments show that our approach indeed identifies highly refined spatiotemporal neighborhoods

    Constraining the Search Space in Temporal Pattern Mining

    Get PDF
    Agents in dynamic environments have to deal with complex situations including various temporal interrelations of actions and events. Discovering frequent patterns in such scenes can be useful in order to create prediction rules which can be used to predict future activities or situations. We present the algorithm MiTemP which learns frequent patterns based on a time intervalbased relational representation. Additionally the problem has also been transfered to a pure relational association rule mining task which can be handled by WARMR. The two approaches are compared in a number of experiments. The experiments show the advantage of avoiding the creation of impossible or redundant patterns with MiTemP. While less patterns have to be explored on average with MiTemP more frequent patterns are found at an earlier refinement level

    Evolving temporal association rules with genetic algorithms

    Get PDF
    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty
    corecore