90 research outputs found

    Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

    Get PDF
    We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)

    An ensemble approach to estimate the fault-time instant

    Get PDF
    Since systems are prone to faults, fault detection and isolation are essential activities to be considered in safety-critical applications. In this direction, the availability of a sound estimate of the time instant the fault occurred is a precious information that a diagnosis system can fruitfully exploit, e.g., to identify information consistent with the faulty state. Unfortunately,any fault-detection system introduces a structural delay that,typically, increases in correspondence of subtle faults (e.g., those characterized by a small magnitude) with a consequence that the fault-occurrence time is overestimated. In this paper we propose an ensemble approach to estimate the time instant a fault occurred. We focus on systems that can be described as ARMA models and faults inducing an abrupt change in the model coefficients.Postprint (published version

    Just-in-Time Adaptive Algorithm for Optimal Parameter Setting in 802.15.4 WSNs

    Get PDF
    Recent studies have shown that the IEEE 802.15.4 MAC protocol suffers from severe limitations, in terms of reliability and energy efficiency, when the CSMA/CA parameter setting is not appropriate. However, selecting the optimal setting that guarantees the application reliability requirements, with minimum energy consumption, is not a trivial task in wireless sensor networks, especially when the operating conditions change over time. In this paper we propose a Just-in-Time LEarning-based Adaptive Parameter tuning (JIT-LEAP) algorithm that adapts the CSMA/CA parameter setting to the time-varying operating conditions by also exploiting the past history to find the most appropriate setting for the current conditions. Following the approach of active adaptive algorithms, the adaptation mechanism of JIT-LEAP is triggered by a change detection test only when needed (i.e., in response to a change in the operating conditions). Simulation results show that the proposed algorithm outperforms other similar algorithms, both in stationary and dynamic scenarios

    Database challenges for exploratory computing

    Get PDF
    Helping users to make sense of very big datasets is nowadays considered an important research topic. However, the tools that are available for data analysis purposes typically address professional data scientists, who, besides a deep knowledge of the domain of interest, master one or more of the following disciplines: mathematics, statistics, computer science, computer engineering, and programming. On the contrary, in our vision it is vital to support also different kinds of users who, for various reasons, may want to analyze the data and obtain new insight from them. Examples of these data enthusiasts [4, 9] are journalists, investors, or politicians: non-technical users who can draw great advantage from exploring the data, achieving new and essential knowledge, instead of reading query results with tons of records. The term data exploration generally refers to a data user being able to find her way through large amounts of data in order to gather the necessary information. A more technical definition comes from the field of statistics, introduced by Tukey [12]: with exploratory data analysis the researcher explores the data in many possible ways, including the use of graphical tools like boxplots or histograms, gaining knowledge from the way data are displayed. Despite the emphasis on visualization, exploratory data analysis still assumes that the user understands at least the basics of statistics, while in this paper we propose a paradigm for database exploration which is in turn inspired by the exploratory computing vision [2]. We may describe exploratory computing as the step-by-step “conversation” of a user and a system that “help each other” to refine the data exploration process, ultimately gathering new knowledge that concretely fullfils the user needs. The process is seen as a conversation since the system provides active support: it not only answers user’s requests, but also suggests one or more possible actions that may help the user to focus the exploratory session. This activity may entail the use of a wide range of different techniques, including the use of statistics and data analysis, query suggestion, advanced visualization tools, etc. The closest analogy [2] is that of a human-tohuman dialogue, in which two people talk, and continuously make reference to their lives, priorities, knowledge and beliefs, leveraging them in order to provide the best possible contribution to the dialogue. In essence, through the conversation they are exploring themselves as well as the information that is conveyed through their words. This exploration process therefore means investigation, exploration-seeking, comparison-making, and learning altogether. It is most appropriate for big collections of semantically rich data, which typically hide precious knowledge behind their complexity. In this broad and innovative context, this paper intends to make a significant step further: it proposes a model to concretely perform this kind of exploration over a database. The model is general enough to encompass most data models and query languages that have been proposed for data management in the last few years. At the same time, it is precise enough to provide a first formalization of the problem and reason about the research challenges posed to database researchers by this new paradigm of interaction

    A distributed Self-adaptive Nonparametric Change-Detection Test for Sensor/Actuator Networks

    Get PDF
    Abstract. The prompt detection of faults and, more in general, changes in stationarity in networked systems such as sensor/actuator networks is a key issue to guarantee robustness and adaptability in applications working in reallife environments. Traditional change-detection methods aiming at assessing the stationary of a data generating process would require a centralized availability of all observations, solution clearly unacceptable when large scale networks are considered and data have local interest. Differently, distributed solutions based on decentralized change-detection tests exploiting information at the unit and cluster level would be a solution. This work suggests a novel distributed change-detection test which operates at two-levels: the first, running on the unit, is particularly reactive in detecting small changes in the process generating the data, whereas the second exploits distributed information at the cluster-level to reduce false positives. Results can be immediately integrated in the machine learning community where adaptive solutions are envisaged to address changes in stationarity of the considered application. A large experimental campaign shows the effectiveness of the approach both on synthetic and real data applications.

    Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

    Get PDF
    We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emphdetectability loss, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)
    • …
    corecore