1,918 research outputs found

    Improving Time Series Forecasting by Discovering Frequent Episodes in Sequences

    Get PDF
    This work aims to improve an existing time series forecasting algorithm –LBF– by the application of frequent episodes techniques as a complementary step to the model. When real-world time series are forecasted, there exist many samples whose values may be specially unexpected. By the combination of frequent episodes and the LBF algorithm, the new procedure does not make better predictions over these outliers but, on the contrary, it is able to predict the apparition of such atypical samples with a great accuracy. In short, this work shows how to detect the occurrence of anomalous samples in time series improving, thus, the general forecasting scheme. Moreover, this hybrid approach has been successfully tested on electricity-related time series

    R-CAD: Rare Cyber Alert Signature Relationship Extraction Through Temporal Based Learning

    Get PDF
    The large number of streaming intrusion alerts make it challenging for security analysts to quickly identify attack patterns. This is especially difficult since critical alerts often occur too rarely for traditional pattern mining algorithms to be effective. Recognizing the attack speed as an inherent indicator of differing cyber attacks, this work aggregates alerts into attack episodes that have distinct attack speeds, and finds attack actions regularly co-occurring within the same episode. This enables a novel use of the constrained SPADE temporal pattern mining algorithm to extract consistent co-occurrences of alert signatures that are indicative of attack actions that follow each other. The proposed Rare yet Co-occurring Attack action Discovery (R-CAD) system extracts not only the co-occurring patterns but also the temporal characteristics of the co-occurrences, giving the `strong rules\u27 indicative of critical and repeated attack behaviors. Through the use of a real-world dataset, we demonstrate that R-CAD helps reduce the overwhelming volume and variety of intrusion alerts to a manageable set of co-occurring strong rules. We show specific rules that reveal how critical attack actions follow one another and in what attack speed

    Contrast set mining in temporal databases

    Get PDF
    Understanding the underlying differences between groups or classes in certain contexts can be of the utmost importance. Contrast set mining relies on discovering significant patterns by contrasting two or more groups. A contrast set is a conjunction of attribute–value pairs that differ meaningfully in its distribution across groups. A previously proposed technique is rules for contrast sets, which seeks to express each contrast set found in terms of rules. This work extends rules for contrast sets to a temporal data mining task. We define a set of temporal patterns in order to capture the significant changes in the contrasts discovered along the considered time line. To evaluate the proposal accuracy and ability to discover relevant information, two different real-life data sets were studied using this approach.(undefined

    Mining Spatio-Temporal Datasets: Relevance, Challenges and Current Research Directions

    Get PDF
    Spatio-temporal data usually records the states over time of an object, an event or a position in space. Spatio-temporal data can be found in several application fields, such as traffic management, environment monitoring, weather forerast, etc. In the past, huge effort was devoted to spatial data representation and manipulation with particular focus on its visualisation. More recently, the interest of many users has shifted from static views of geospatial phenomena, which capture its “spatiality” only, to more advanced means of discovering dynamic relationships among the patterns and events contained in the data as well as understanding the changes occurring in spatial data over time

    Mining Predictive Patterns and Extension to Multivariate Temporal Data

    Get PDF
    An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

    Building a validation measure for activity-based transportation models based on mobile phone data

    Full text link
    Activity-based micro-simulation transportation models typically predict 24-h activity-travel sequences for each individual in a study area. These sequences serve as a key input for travel demand analysis and forecasting in the region. However, despite their importance, the lack of a reliable benchmark to evaluate the generated sequences has hampered further development and application of the models. With the wide deployment of mobile phone devices today, we explore the possibility of using the travel behavioral information derived from mobile phone data to build such a validation measure. Our investigation consists of three steps. First, the daily trajectory of locations, where a user performed activities, is constructed from the mobile phone records. To account for the discrepancy between the stops revealed by the call data and the real location traces that the user has made, the daily trajectories are then transformed into actual travel sequences. Finally, all the derived sequences are classified into typical activity-travel patterns which, in combination with their relative frequencies, define an activity-travel profile. The established profile characterizes the current activity-travel behavior in the study area, and can thus be used as a benchmark for the assessment of the activity-based transportation models. By comparing the activity-travel profiles derived from the call data with statistics that stem from traditional activity-travel surveys, the validation potential is demonstrated. In addition, a sensitivity analysis is carried out to assess how the results are affected by the different parameter settings defined in the profiling process

    Identification des régimes et regroupement des séquences pour la prévision des marchés financiers

    Get PDF
    Abstract : Regime switching analysis is extensively advocated to capture complex behaviors underlying financial time series for market prediction. Two main disadvantages in current approaches of regime identification are raised in the literature: 1) the lack of a mechanism for identifying regimes dynamically, restricting them to switching among a fixed set of regimes with a static transition probability matrix; 2) failure to utilize cross-sectional regime dependencies among time series, since not all the time series are synchronized to the same regime. As the numerical time series can be symbolized into categorical sequences, a third issue raises: 3) the lack of a meaningful and effective measure of the similarity between chronological dependent categorical values, in order to identify sequence clusters that could serve as regimes for market forecasting. In this thesis, we propose a dynamic regime identification model that can identify regimes dynamically with a time-varying transition probability, to address the first issue. For the second issue, we propose a cluster-based regime identification model to account for the cross-sectional regime dependencies underlying financial time series for market forecasting. For the last issue, we develop a dynamic order Markov model, making use of information underlying frequent consecutive patterns and sparse patterns, to identify the clusters that could serve as regimes identified on categorized financial time series. Experiments on synthetic and real-world datasets show that our two regime models show good performance on both regime identification and forecasting, while our dynamic order Markov clustering model also demonstrates good performance on identifying clusters from categorical sequences.L'analyse de changement de régime est largement préconisée pour capturer les comportements complexes sous-jacents aux séries chronologiques financières pour la prédiction du marché. Deux principaux problèmes des approches actuelles d'identifica-tion de régime sont soulevés dans la littérature. Il s’agit de: 1) l'absence d'un mécanisme d'identification dynamique des régimes. Ceci limite la commutation entre un ensemble fixe de régimes avec une matrice de probabilité de transition statique; 2) l’incapacité à utiliser les dépendances transversales des régimes entre les séries chronologiques, car toutes les séries chronologiques ne sont pas synchronisées sur le même régime. Étant donné que les séries temporelles numériques peuvent être symbolisées en séquences catégorielles, un troisième problème se pose: 3) l'absence d'une mesure significative et efficace de la similarité entre les séries chronologiques dépendant des valeurs catégorielles pour identifier les clusters de séquences qui pourraient servir de régimes de prévision du marché. Dans cette thèse, nous proposons un modèle d'identification de régime dynamique qui identifie dynamiquement des régimes avec une probabilité de transition variable dans le temps afin de répondre au premier problème. Ensuite, pour adresser le deuxième problème, nous proposons un modèle d'identification de régime basé sur les clusters. Notre modèle considère les dépendances transversales des régimes sous-jacents aux séries chronologiques financières avant d’effectuer la prévision du marché. Pour terminer, nous abordons le troisième problème en développant un modèle de Markov d'ordre dynamique, en utilisant les informations sous-jacentes aux motifs consécutifs fréquents et aux motifs clairsemés, pour identifier les clusters qui peuvent servir de régimes identifiés sur des séries chronologiques financières catégorisées. Nous avons mené des expériences sur des ensembles de données synthétiques et du monde réel. Nous démontrons que nos deux modèles de régime présentent de bonnes performances à la fois en termes d'identification et de prévision de régime, et notre modèle de clustering de Markov d'ordre dynamique produit également de bonnes performances dans l'identification de clusters à partir de séquences catégorielles
    corecore