8 research outputs found

    Periodic Pattern Mining a Algorithms and Applications

    Get PDF
    Owing to a large number of applications periodic pattern mining has been extensively studied for over a decade Periodic pattern is a pattern that repeats itself with a specific period in a give sequence Periodic patterns can be mined from datasets like biological sequences continuous and discrete time series data spatiotemporal data and social networks Periodic patterns are classified based on different criteria Periodic patterns are categorized as frequent periodic patterns and statistically significant patterns based on the frequency of occurrence Frequent periodic patterns are in turn classified as perfect and imperfect periodic patterns full and partial periodic patterns synchronous and asynchronous periodic patterns dense periodic patterns approximate periodic patterns This paper presents a survey of the state of art research on periodic pattern mining algorithms and their application areas A discussion of merits and demerits of these algorithms was given The paper also presents a brief overview of algorithms that can be applied for specific types of datasets like spatiotemporal data and social network

    Improved Periodicity Mining in Time Series Databases

    Get PDF
    Time series data represents information about real world phenomena and periodicity mining explores the interesting periodic behavior that is inherent in the data. Periodicity mining has numerous applications such as in weather forecasting, stock market prediction and analysis, pattern recognition, etc. Recently, the suffix tree, a powerful data structure that efficiently solves many strings related problems has been used to gather information about repeated substrings in the text and then perform periodicity mining. However, periodicity mining deals with large amounts of data which makes it difficult to perform mining in main memory due to the space constraints of the suffix tree. Thus, we first propose the use of the Compressed Suffix Tree (CST) for space efficient periodicity mining in very large datasets. Given the time-space trade-off that comes with any practical usage of the CST, we provide a comprehensive empirical analysis on the practical usage of CSTs and traditional suffix trees for periodicity mining.;Noise is an inherent part of practical time series data, and it is important to mine periods in spite of the noise. This leads to the problem of approximate periodicity mining. Existing algorithms have dealt with the noise introduced between the occurrences of the periodic pattern, but not the noise introduced in the structure of the pattern itself. We present a taxonomy for approximate periodicity and then propose an algorithm that performs periodicity mining in the presence of noise introduced simultaneously in both the structure of the pattern and between the periodic occurrences of the pattern

    Periodic pattern mining from spatio-temporal trajectory data

    Get PDF
    Rapid development in GPS tracking techniques produces a large number of spatio-temporal trajectory data. The analysis of these data provides us with a new opportunity to discover useful behavioural patterns. Spatio-temporal periodic pattern mining is employed to find temporal regularities for interesting places. Mining periodic patterns from spatio-temporal trajectories can reveal useful, important and valuable information about people's regular and recurrent movements and behaviours. Previous studies have been proposed to extract people's regular and repeating movement behavior from spatio-temporal trajectories. These previous approaches can target three following issues, (1) long individual trajectory; (2) spatial fuzziness; and (3) temporal fuzziness. First, periodic pattern mining is different to other pattern mining, such as association rule ming and sequential pattern mining, periodic pattern mining requires a very long trajectory from an individual so that the regular period can be extracted from this long single trajectory, for example, one month or one year period. Second, spatial fuzziness shows although a moving object can regularly move along the similar route, it is impossible for it to appear at the exactly same location. For instance, Bob goes to work everyday, and although he can follow a similar path from home to his workplace, the same location cannot be repeated across different days. Third, temporal fuzziness shows that periodicity is complicated including partial time span and multiple interleaving periods. In reality, the period is partial, it is highly impossible to occur through the whole movement of the object. Alternatively, the moving object has only a few periods, such as a daily period for work, or yearly period for holidays. However, it is insufficient to find effective periodic patterns considering these three issues only. This thesis aims to develop a new framework to extract more effective, understandable and meaningful periodic patterns by taking more features of spatio-temporal trajectories into account. The first feature is trajectory sequence, GPS trajectory data is temporally ordered sequences of geolocation which can be represented as consecutive trajectory segments, where each entry in each trajectory segment is closely related to the previous sampled point (trajectory node) and the latter one, rather than being isolated. Existing approaches disregard the important sequential nature of trajectory. Furthermore, they introduce both unwanted false positive reference spots and false negative reference spots. The second feature is spatial and temporal aspects. GPS trajectory data can be presented as triple data (x; y; t), x and y represent longitude and latitude respectively whilst t shows corresponding time in this location. Obviously, spatial and temporal aspects are two key factors. Existing methods do not consider these two aspects together in periodic pattern mining. Irregular time interval is the third feature of spatio-temporal trajectory. In reality, due to weather conditions, device malfunctions, or battery issues, the trajectory data are not always regularly sampled. Existing algorithms cannot deal with this issue but instead require a computationally expensive trajectory interpolation process, or it is assumed that trajectory is with regular time interval. The fourth feature is hierarchy of space. Hierarchy is an inherent property of spatial data that can be expressed in different levels, such as a country includes many states, a shopping mall is comprised of many shops. Hierarchy of space can find more hidden and valuable periodic patterns. Existing studies do not consider this inherent property of trajectory. Hidden background semantic information is the final feature. Aspatial semantic information is one of important features in spatio-temporal data, and it is embedded into the trajectory data. If the background semantic information is considered, more meaningful, understandable and useful periodic patterns can be extracted. However, existing methods do not consider the geographical information underlying trajectories. In addition, at times we are interested in finding periodic patterns among trajectory paths rather than trajectory nodes for different applications. This means periodic patterns should be identified and detected against trajectory paths rather than trajectory nodes for some applications. Existing approaches for periodic pattern mining focus on trajectories nodes rather than paths. To sum up, the aim of this thesis is to investigate solutions to these problems in periodic pattern mining in order to extract more meaningful, understandable periodic patterns. Each of three chapters addresses a different problem and then proposes adequate solutions to problems currently not addressed in existing studies. Finally, this thesis proposes a new framework to address all problems. First, we investigated a path-based solution which can target trajectory sequence and spatio-temporal aspects. We proposed an algorithm called Traclus (spatio-temporal) which can take spatial and temporal aspects into account at the same time instead of only considering spatial aspect. The result indicated our method produced more effective periodic patterns based on trajectory paths than existing node-based methods using two real-world trajectories. In order to consider hierarchy of space, we investigated existing hierarchical clustering approaches to obtain hierarchical reference spots (trajectory paths) for periodic pattern mining. HDBSCAN is an incremental version of DBSCAN which is able to handle clusters with different densities to generate a hierarchical clustering result using the single-linkage method, and then it automatically extracts clusters from a hierarchical tree. Thus, we modified traditional clustering method DBSCAN in Traclus (spatio-temporal) to HDBSCAN for extraction of hierarchical reference spots. The result is convincing, and reveals more periodic patterns than those of existing methods. Second, we introduced a stop/move method to annotate each spatio-temporal entry with a semantic label, such as restaurant, university and hospital. This method can enrich a trajectory with background semantic information so that we can easily infer people's repeating behaviors. In addition, existing methods use interpolation to make trajectory regular and then apply Fourier transform and autocorrelation to automatically detect period for each reference spot. An increasing number of trajectory nodes leads to an exponential increase of running time. Thus, we employed Lomb-Scargle periodogram to detect period for each reference spot based on raw trajectory without requiring any interpolation method. The results showed our method outperformed existing approaches on effectiveness and efficiency based on two real datasets. For hierarchical aspect, we extended previous work to find hierarchical semantic periodic patterns by applying HDBSCAN. The results were promising. Third, we apply our methodology to a case study, which reveals many interesting medical periodic patterns. These patterns can effectively explore human movement behaviors for positive medical outcomes. To sum up, this research proposed a new framework to gradually target the problems that existing methods cannot handle. These include: how to consider trajectory sequence, how to consider spatial temporal aspects together, how to deal with trajectory with irregular time interval, how to consider hierarchy of space and how to extract semantic information behind trajectory. After addressing all these problems, the experimental results demonstrate that our method can find more understandable, meaningful and effective periodic patterns than existing approaches

    Analyse de systèmes temps-réel par traçage

    Get PDF
    Résumé Le traçage est une technique qui permet de récupérer de l'information très précise sur l'exécution d'un système avec un impact minime. Afin de mieux comprendre l'exécution d'une application, la technique habituelle consiste à y attacher un débogueur et interrompre l'application, afin d'inspecter la valeurs de certaines variables, par exemple. Cette approche est mal adaptée aux applications qui ont des interactions fréquentes avec le système lui-même ou avec d'autres applications. Les applications temps réel sont un type d'application qui possède ce genre d'interactions. L'aspect temporel de leur exécution rend l'utilisation d'un débogueur inutile dans plusieurs cas. L'impact minimal du traçage sur l'exécution d'une application lui confère un atout important pour mieux comprendre les interactions complexes qui peuvent agir au sein d'un système temps réel. Afin de minimiser l'impact de l'instrumentation, il est souhaitable de réduire la quantité d'information récupérée. Il est donc important de bien identifier l'information minimale nécessaire à l'analyse, qui ne causera pas de latences indues. Dans un même ordre d'idées, le traceur ne peut pas se permettre d'effectuer un traitement coûteux avant l'enregistrement des informations. Les événements récupérés devront donc contenir une information brute sur l'exécution du système. L'objectif de cette recherche est de montrer que l'information récupérée lors du traçage d'un système temps réel peut être utilisée pour extraire de l'information permettant de mieux comprendre des comportements propres aux systèmes temps réel. L'hypothèse de ce travail est que le traçage permet de récupérer de l'information sur l'exécution d'une application temps réel et que les informations de traçage ainsi récupérées peuvent être utilisées pour diagnostiquer des problèmes difficilement observables. Nous étudions d'abord les différents outils de traçage en fonction de leurs fonctionalités et de leur impact sur les systèmes temps réel. Ensuite, nous comparons les différents outils d'analyse de trace selon deux grandes catégories: les approches algorithmiques et les techniques de visualisation. Des problèmes typiques des applications temps réel sont initialement identifés et serviront de base pour guider notre recherche. Un algorithme est développé afin d'analyser la trace et retrouver ces problèmes typiques. L'algorithme est testé sur des traces générées à partir de cas de tests et sa performance sera évaluée. La première contribution de ce travail consiste en la mise au point d'un algorithme permettant de générer un modèle à haut niveau d'une application temps réel à partir de la trace de son interaction avec le noyau du système d'exploitation. Ce modèle utilise la sémantique particulière des événements afin de produire une machine à états simple et rapide. Les événements nécessaires et minimaux à l'analyse sont identifiés de façon à limiter l'impact du traçage sur l'application. La deuxième contribution consiste en l'élaboration d'un outil de visualisation permettant de comparer directement les différentes phases d'exécution d'une application temps réel. Cet outil utilise le modèle généré à partir des informations de traçage afin d'identifier les différentes phases et d'extraire plusieurs statistiques utiles à la compréhension globale de l'exécution. Finalement, une structure de stockage des statistiques est améliorée de façon à récupérer efficacement des statistiques qui évoluent de façon continue dans le temps. Le résultat final est un outil qui permet de diagnostiquer les problèmes des applications temps réel grâce aux informations contenues dans une trace noyau, en plus de faciliter la découverte de patrons d'intérêt.----------Abstract Tracing is a technique to gather precise information about the execution of a system with minimal impact. In order to better understand the execution of an application, the usual technique consists in attaching a debugger and interrupting the application, to inspect the value of certain variables, for example. This approach is ill-suited for applications that are tightly coupled with the system itself. Real-time applications are a type of applications that exhibit this sort of interactions. The temporal aspect of their execution nullifies the use of a debugger. The low impact of a tracer on the execution of an application is therefore an important aspect providing better understanding of complex interactions in real-time systems. Even then, it is important to minimize even further the impact of tracing by reducing the amount of gathered information. It is therefore important to identify the minimal information that is necessary for the analysis that will not cause undue latency. Similarly, the tracer cannot afford complex processing while collecting the information. It must write the raw events as fast as possible. The objective of this research is to show that the information gathered during the tracing of a real-time system can be used to extract additional information that can be used to better understand the behaviour of real-time systems. We will first study the different tracing tools according to their functionalities and impact on real-time systems. Then, we will compare different trace analysis tools according to two main categories : algorithmic approaches and visualization techniques. Typical real-time application problems will be identified and used as a baseline to guide our research. An algorithm will then be developed to analyse the trace and find these typical problems. The algorithm will be tested on traces generated from test cases and its performance evaluated. The hypothesis of this work is that tracing allows gathering information about the execution of real-time applications and that this tracing information can be used to diagnose problems that are otherwise difficult to observe. The result of this work is the creation of a model allowing the extraction of statistics and the generation of visualizations from kernel traces gathered on a real-time system. This model uses event semantics to produce a finite-state machine that is both simple and fast. The minimal and necessary events for the analysis are identified in order to limit the impact of tracing on the application. Finally, a statistics storage structure is improved in order to retrieve efficiently statistics that are continuously variable through time. The final result is a tool to diagnose real-time application problems using the information stored inside a kernel trace and aid in the discovery of interesting patterns
    corecore