90 research outputs found
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
We address the problem of detecting changes in multivariate datastreams, and
we investigate the intrinsic difficulty that change-detection methods have to
face when the data dimension scales. In particular, we consider a general
approach where changes are detected by comparing the distribution of the
log-likelihood of the datastream over different time windows. Despite the fact
that this approach constitutes the frame of several change-detection methods,
its effectiveness when data dimension scales has never been investigated, which
is indeed the goal of our paper. We show that the magnitude of the change can
be naturally measured by the symmetric Kullback-Leibler divergence between the
pre- and post-change distributions, and that the detectability of a change of a
given magnitude worsens when the data dimension increases. This problem, which
we refer to as \emph{detectability loss}, is due to the linear relationship
between the variance of the log-likelihood and the data dimension. We
analytically derive the detectability loss on Gaussian-distributed datastreams,
and empirically demonstrate that this problem holds also on real-world datasets
and that can be harmful even at low data-dimensions (say, 10)
An ensemble approach to estimate the fault-time instant
Since systems are prone to faults, fault detection and isolation are essential activities to be considered in safety-critical applications. In this direction, the availability of a sound estimate of the time instant the fault occurred is a precious information that a diagnosis system can fruitfully exploit, e.g., to identify information consistent with the faulty state. Unfortunately,any fault-detection system introduces a structural delay that,typically, increases in correspondence of subtle faults (e.g., those characterized by a small magnitude) with a consequence that the fault-occurrence time is overestimated.
In this paper we propose an ensemble approach to estimate the time instant a fault occurred. We focus on systems that can be described as ARMA models and faults inducing an abrupt change in the model coefficients.Postprint (published version
Just-in-Time Adaptive Algorithm for Optimal Parameter Setting in 802.15.4 WSNs
Recent studies have shown that the IEEE 802.15.4 MAC protocol suffers from severe limitations, in terms
of reliability and energy efficiency, when the CSMA/CA parameter setting is not appropriate. However,
selecting the optimal setting that guarantees the application reliability requirements, with minimum
energy consumption, is not a trivial task in wireless sensor networks, especially when the operating
conditions change over time. In this paper we propose a Just-in-Time LEarning-based Adaptive Parameter
tuning (JIT-LEAP) algorithm that adapts the CSMA/CA parameter setting to the time-varying operating
conditions by also exploiting the past history to find the most appropriate setting for the current
conditions. Following the approach of active adaptive algorithms, the adaptation mechanism of JIT-LEAP
is triggered by a change detection test only when needed (i.e., in response to a change in the operating
conditions). Simulation results show that the proposed algorithm outperforms other similar algorithms,
both in stationary and dynamic scenarios
Database challenges for exploratory computing
Helping users to make sense of very big datasets
is nowadays considered an important research topic.
However, the tools that are available for data analysis
purposes typically address professional data scientists,
who, besides a deep knowledge of the domain
of interest, master one or more of the following
disciplines: mathematics, statistics, computer
science, computer engineering, and programming.
On the contrary, in our vision it is vital to support
also different kinds of users who, for various reasons,
may want to analyze the data and obtain new
insight from them. Examples of these data enthusiasts
[4, 9] are journalists, investors, or politicians:
non-technical users who can draw great advantage
from exploring the data, achieving new and essential
knowledge, instead of reading query results with
tons of records.
The term data exploration generally refers to a
data user being able to find her way through large
amounts of data in order to gather the necessary information.
A more technical definition comes from
the field of statistics, introduced by Tukey [12]: with
exploratory data analysis the researcher explores the
data in many possible ways, including the use of
graphical tools like boxplots or histograms, gaining
knowledge from the way data are displayed.
Despite the emphasis on visualization, exploratory
data analysis still assumes that the user understands
at least the basics of statistics, while in this
paper we propose a paradigm for database exploration
which is in turn inspired by the exploratory
computing vision [2]. We may describe exploratory
computing as the step-by-step “conversation” of a
user and a system that “help each other” to refine
the data exploration process, ultimately gathering
new knowledge that concretely fullfils the user
needs. The process is seen as a conversation since
the system provides active support: it not only answers
user’s requests, but also suggests one or more
possible actions that may help the user to focus the exploratory session. This activity may entail the
use of a wide range of different techniques, including
the use of statistics and data analysis, query
suggestion, advanced visualization tools, etc.
The closest analogy [2] is that of a human-tohuman
dialogue, in which two people talk, and continuously
make reference to their lives, priorities,
knowledge and beliefs, leveraging them in order to
provide the best possible contribution to the dialogue.
In essence, through the conversation they
are exploring themselves as well as the information
that is conveyed through their words. This
exploration process therefore means investigation,
exploration-seeking, comparison-making, and learning
altogether. It is most appropriate for big collections
of semantically rich data, which typically hide
precious knowledge behind their complexity.
In this broad and innovative context, this paper
intends to make a significant step further: it proposes
a model to concretely perform this kind of
exploration over a database. The model is general
enough to encompass most data models and query
languages that have been proposed for data management
in the last few years. At the same time,
it is precise enough to provide a first formalization
of the problem and reason about the research challenges
posed to database researchers by this new
paradigm of interaction
A distributed Self-adaptive Nonparametric Change-Detection Test for Sensor/Actuator Networks
Abstract. The prompt detection of faults and, more in general, changes in stationarity in networked systems such as sensor/actuator networks is a key issue to guarantee robustness and adaptability in applications working in reallife environments. Traditional change-detection methods aiming at assessing the stationary of a data generating process would require a centralized availability of all observations, solution clearly unacceptable when large scale networks are considered and data have local interest. Differently, distributed solutions based on decentralized change-detection tests exploiting information at the unit and cluster level would be a solution. This work suggests a novel distributed change-detection test which operates at two-levels: the first, running on the unit, is particularly reactive in detecting small changes in the process generating the data, whereas the second exploits distributed information at the cluster-level to reduce false positives. Results can be immediately integrated in the machine learning community where adaptive solutions are envisaged to address changes in stationarity of the considered application. A large experimental campaign shows the effectiveness of the approach both on synthetic and real data applications.
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emphdetectability loss, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)
- …