6 research outputs found

    Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

    Get PDF
    We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)

    Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

    Get PDF
    We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emphdetectability loss, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)

    Data Driven Techniques for Modeling Coupled Dynamics in Transient Processes

    Get PDF
    We study the problem of modeling coupled dynamics in transient processes that happen in a network. The problem is considered at two levels. At the node level, the coupling between underlying sub-processes of a node in a network is considered. At the network level, the direct influence among the nodes is considered. After the model is constructed, we develop a network-based approach for change detection in high dimension transient processes. The overall contribution of our work is a more accurate model to describe the underlying transient dynamics either for each individual node or for the whole network and a new statistic for change detection in multi-dimensional time series. Specifically, at the node level, we developed a model to represent the coupled dynamics between the two processes. We provide closed form formulas on the conditions for the existence of periodic trajectory and the stability of solutions. Numerical studies suggest that our model can capture the nonlinear characteristics of empirical data while reducing computation time by about 25% on average, compared to a benchmark modeling approach. In the last two problems, we provide a closed form formula for the bound in the sparse regression formulation, which helps to reduce the effort of trial and error to find an appropriate bound. Compared to other benchmark methods in inferring network structure from time series, our method reduces inference error by up to 5 orders of magnitudes and maintain better sparsity. We also develop a new method to infer dynamic network structure from a single time series. This method is the basis for introducing a new spectral graph statistic for change detection. This statistic can detect changes in simulation scenario with modified area under curve (mAUC) of 0.96. When applying to the problem of detecting seizure from EEG signal, our statistic can capture the physiology of the process while maintaining a detection rate of 40% by itself. Therefore, it can serve as an effective feature to detect change and can be added to the current set of features for detecting seizures from EEG signal

    Data Driven Techniques for Modeling Coupled Dynamics in Transient Processes

    Get PDF
    We study the problem of modeling coupled dynamics in transient processes that happen in a network. The problem is considered at two levels. At the node level, the coupling between underlying sub-processes of a node in a network is considered. At the network level, the direct influence among the nodes is considered. After the model is constructed, we develop a network-based approach for change detection in high dimension transient processes. The overall contribution of our work is a more accurate model to describe the underlying transient dynamics either for each individual node or for the whole network and a new statistic for change detection in multi-dimensional time series. Specifically, at the node level, we developed a model to represent the coupled dynamics between the two processes. We provide closed form formulas on the conditions for the existence of periodic trajectory and the stability of solutions. Numerical studies suggest that our model can capture the nonlinear characteristics of empirical data while reducing computation time by about 25% on average, compared to a benchmark modeling approach. In the last two problems, we provide a closed form formula for the bound in the sparse regression formulation, which helps to reduce the effort of trial and error to find an appropriate bound. Compared to other benchmark methods in inferring network structure from time series, our method reduces inference error by up to 5 orders of magnitudes and maintain better sparsity. We also develop a new method to infer dynamic network structure from a single time series. This method is the basis for introducing a new spectral graph statistic for change detection. This statistic can detect changes in simulation scenario with modified area under curve (mAUC) of 0.96. When applying to the problem of detecting seizure from EEG signal, our statistic can capture the physiology of the process while maintaining a detection rate of 40% by itself. Therefore, it can serve as an effective feature to detect change and can be added to the current set of features for detecting seizures from EEG signal

    Détection de ruptures multiples dans des séries temporelles multivariées : application à l'inférence de réseaux de dépendance

    Get PDF
    This thesis presents a method for the multiple change-points detection in multivariate time series, and exploits the results to estimate the relationships between the components of the system. The originality of the model, called the Bernoulli Detector, relies on the combination of a local statistics from a robust test, based on the computation of ranks, with a global Bayesian framework. This non parametric model does not require strong hypothesis on the distribution of the observations. It is applicable without modification on gaussian data as well as data corrupted by outliers. The detection of a single change-point is controlled even for small samples. In a multivariate context, a term is introduced to model the dependencies between the changes, assuming that if two components are connected, the events occurring in the first one tend to affect the second one instantaneously. Thanks to this flexible model, the segmentation is sensitive to common changes shared by several signals but also to isolated changes occurring in a single signal. The method is compared with other solutions of the literature, especially on real datasets of electrical household consumption and genomic measurements. These experiments enhance the interest of the model for the detection of change-points in independent, conditionally independent or fully connected signals. The synchronization of the change-points within the time series is finally exploited in order to estimate the relationships between the variables, with the Bayesian network formalism. By adapting the score function of a structure learning method, it is checked that the independency model that describes the system can be partly retrieved through the information given by the change-points, estimated by the Bernoulli Detector.Cette thèse présente une méthode pour la détection hors-ligne de multiples ruptures dans des séries temporelles multivariées, et propose d'en exploiter les résultats pour estimer les relations de dépendance entre les variables du système. L'originalité du modèle, dit du Bernoulli Detector, réside dans la combinaison de statistiques locales issues d'un test robuste, comparant les rangs des observations, avec une approche bayésienne. Ce modèle non paramétrique ne requiert pas d'hypothèse forte sur les distributions des données. Il est applicable sans ajustement à la loi gaussienne comme sur des données corrompues par des valeurs aberrantes. Le contrôle de la détection d'une rupture est prouvé y compris pour de petits échantillons. Pour traiter des séries temporelles multivariées, un terme est introduit afin de modéliser les dépendances entre les ruptures, en supposant que si deux entités du système étudié sont connectées, les événements affectant l'une s'observent instantanément sur l'autre avec une forte probabilité. Ainsi, le modèle s'adapte aux données et la segmentation tient compte des événements communs à plusieurs signaux comme des événements isolés. La méthode est comparée avec d'autres solutions de l'état de l'art, notamment sur des données réelles de consommation électrique et génomiques. Ces expériences mettent en valeur l'intérêt du modèle pour la détection de ruptures entre des signaux indépendants, conditionnellement indépendants ou complètement connectés. Enfin, l'idée d'exploiter les synchronisations entre les ruptures pour l'estimation des relations régissant les entités du système est développée, grâce au formalisme des réseaux bayésiens. En adaptant la fonction de score d'une méthode d'apprentissage de la structure, il est vérifié que le modèle d'indépendance du système peut être en partie retrouvé grâce à l'information apportée par les ruptures, estimées par le modèle du Bernoulli Detector
    corecore