210 research outputs found
Long-term adaptation and distributed detection of local network changes
We present a statistical approach to distributed detection of local latency shifts in networked systems. For this purpose, response delay measurements are performed between neighbouring nodes via probing. The expected probe response delay on each connection is statistically modelled via parameter estimation. Adaptation to drifting delays is accounted for by the use of overlapping models, such that previous models are partially used as input to future models. Based on the symmetric Kullback-Leibler divergence metric, latency shifts can be detected by comparing the estimated parameters of the current and previous models. In order to reduce the number of detection alarms, thresholds for divergence and convergence are used.
The method that we propose can be applied to many types of statistical distributions, and requires only constant memory compared to e.g., sliding window techniques and decay functions. Therefore, the method is applicable in various kinds of network equipment with limited capacity, such as sensor networks, mobile ad hoc networks etc. We have investigated the behaviour of the method for different model parameters. Further, we have tested the detection performance in network simulations, for both gradual and abrupt shifts in the probe response delay. The results indicate that over 90% of the shifts can be detected. Undetected shifts are mainly the effects of long convergence processes triggered by previous shifts. The overall performance depends on the characteristics of the shifts and the configuration of the model parameters
On the Window Size for Classification in Changing Environments
Classification in changing environments (commonly known as concept drift) requires adaptation of the classifier to accommodate the
changes. One approach is to keep a moving window on the streaming data and constantly update the classifier on it. Here we consider an
abrupt change scenario where one set of probability distributions of the classes is instantly replaced with another. For a fixed ‘transition
period’ around the change, we derive a generic relationship between the size of the moving window and the classification error rate. We
derive expressions for the error in the transition period and for the optimal window size for the case of two Gaussian classes where the
concept change is a geometrical displacement of the whole class configuration in the space. A simple window resize strategy based
on the derived relationship is proposed and compared with fixed-size windows on a real benchmark data set data set (Electricity Market)
Discovering ship navigation patterns towards environmental impact modeling
In this work a data pipe-line to manage and extract patterns from time-series is described. The patterns found with a combination of Conditional Restricted Boltzmann Machine (CRBM) and k-Means algorithms are then validated using a visualization tool. The motivation of finding these patterns is to leverage future emission model
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
We address the problem of detecting changes in multivariate datastreams, and
we investigate the intrinsic difficulty that change-detection methods have to
face when the data dimension scales. In particular, we consider a general
approach where changes are detected by comparing the distribution of the
log-likelihood of the datastream over different time windows. Despite the fact
that this approach constitutes the frame of several change-detection methods,
its effectiveness when data dimension scales has never been investigated, which
is indeed the goal of our paper. We show that the magnitude of the change can
be naturally measured by the symmetric Kullback-Leibler divergence between the
pre- and post-change distributions, and that the detectability of a change of a
given magnitude worsens when the data dimension increases. This problem, which
we refer to as \emph{detectability loss}, is due to the linear relationship
between the variance of the log-likelihood and the data dimension. We
analytically derive the detectability loss on Gaussian-distributed datastreams,
and empirically demonstrate that this problem holds also on real-world datasets
and that can be harmful even at low data-dimensions (say, 10)
Improving adaptation and interpretability of a short-term traffic forecasting system
Traffic management is being more important than ever, especially in overcrowded big cities with over-pollution problems and with new unprecedented mobility changes. In this scenario, road-traffic prediction plays a key role within Intelligent Transportation Systems, allowing traffic managers to be able to anticipate and take the proper decisions. This paper aims to analyse the situation in a commercial real-time prediction system with its current problems and limitations. The analysis unveils the trade-off between simple parsimonious models and more complex models. Finally, we propose an enriched machine learning framework, Adarules, for the traffic prediction in real-time facing the problem as continuously incoming data streams with all the commonly occurring problems in such volatile scenario, namely changes in the network infrastructure and demand, new detection stations or failure ones, among others. The framework is also able to infer automatically the most relevant features to our end-task, including the relationships within the road network. Although the intention with the proposed framework is to evolve and grow with new incoming big data, however there is no limitation in starting to use it without any prior knowledge as it can starts learning the structure and parameters automatically from data. We test this predictive system in different real-work scenarios, and evaluate its performance integrating a multi-task learning paradigm for the sake of the traffic prediction task.Peer ReviewedPostprint (published version
A modified Learn++.NSE algorithm for dealing with concept drift
© 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. Concept drift is a very pervasive phenomenon in real world applications. By virtue of variety change types of concept drift, it makes more difficult for learning algorithm to track the concept drift very closely. Learn++.NSE is an incremental ensemble learner without any assumption on change type of concept drift. Even though it has good performance on handling concept drift, but it costs high computation and needs more time to recover from accuracy drop. This paper proposed a modified Learn++.NSE algorithm. During learning instances in data stream, our algorithm first identifies where and when drift happened, then uses instances accumulated by drift detection method to create a new base classifier, and finally organized all existing classifiers based on Learn++.NSE weighting mechanism to update ensemble learner. This modified algorithm can reduce high computation cost without any performance drop and improve the accuracy recover speed when drift happened
Handling Concept Drift for Predictions in Business Process Mining
Predictive services nowadays play an important role across all business
sectors. However, deployed machine learning models are challenged by changing
data streams over time which is described as concept drift. Prediction quality
of models can be largely influenced by this phenomenon. Therefore, concept
drift is usually handled by retraining of the model. However, current research
lacks a recommendation which data should be selected for the retraining of the
machine learning model. Therefore, we systematically analyze different data
selection strategies in this work. Subsequently, we instantiate our findings on
a use case in process mining which is strongly affected by concept drift. We
can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift
handling. Furthermore, we depict the effects of the different data selection
strategies
- …