210 research outputs found

    Long-term adaptation and distributed detection of local network changes

    Get PDF
    We present a statistical approach to distributed detection of local latency shifts in networked systems. For this purpose, response delay measurements are performed between neighbouring nodes via probing. The expected probe response delay on each connection is statistically modelled via parameter estimation. Adaptation to drifting delays is accounted for by the use of overlapping models, such that previous models are partially used as input to future models. Based on the symmetric Kullback-Leibler divergence metric, latency shifts can be detected by comparing the estimated parameters of the current and previous models. In order to reduce the number of detection alarms, thresholds for divergence and convergence are used. The method that we propose can be applied to many types of statistical distributions, and requires only constant memory compared to e.g., sliding window techniques and decay functions. Therefore, the method is applicable in various kinds of network equipment with limited capacity, such as sensor networks, mobile ad hoc networks etc. We have investigated the behaviour of the method for different model parameters. Further, we have tested the detection performance in network simulations, for both gradual and abrupt shifts in the probe response delay. The results indicate that over 90% of the shifts can be detected. Undetected shifts are mainly the effects of long convergence processes triggered by previous shifts. The overall performance depends on the characteristics of the shifts and the configuration of the model parameters

    On the Window Size for Classification in Changing Environments

    Get PDF
    Classification in changing environments (commonly known as concept drift) requires adaptation of the classifier to accommodate the changes. One approach is to keep a moving window on the streaming data and constantly update the classifier on it. Here we consider an abrupt change scenario where one set of probability distributions of the classes is instantly replaced with another. For a fixed ‘transition period’ around the change, we derive a generic relationship between the size of the moving window and the classification error rate. We derive expressions for the error in the transition period and for the optimal window size for the case of two Gaussian classes where the concept change is a geometrical displacement of the whole class configuration in the space. A simple window resize strategy based on the derived relationship is proposed and compared with fixed-size windows on a real benchmark data set data set (Electricity Market)

    Discovering ship navigation patterns towards environmental impact modeling

    Get PDF
    In this work a data pipe-line to manage and extract patterns from time-series is described. The patterns found with a combination of Conditional Restricted Boltzmann Machine (CRBM) and k-Means algorithms are then validated using a visualization tool. The motivation of finding these patterns is to leverage future emission model

    Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

    Get PDF
    We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10)

    Improving adaptation and interpretability of a short-term traffic forecasting system

    Get PDF
    Traffic management is being more important than ever, especially in overcrowded big cities with over-pollution problems and with new unprecedented mobility changes. In this scenario, road-traffic prediction plays a key role within Intelligent Transportation Systems, allowing traffic managers to be able to anticipate and take the proper decisions. This paper aims to analyse the situation in a commercial real-time prediction system with its current problems and limitations. The analysis unveils the trade-off between simple parsimonious models and more complex models. Finally, we propose an enriched machine learning framework, Adarules, for the traffic prediction in real-time facing the problem as continuously incoming data streams with all the commonly occurring problems in such volatile scenario, namely changes in the network infrastructure and demand, new detection stations or failure ones, among others. The framework is also able to infer automatically the most relevant features to our end-task, including the relationships within the road network. Although the intention with the proposed framework is to evolve and grow with new incoming big data, however there is no limitation in starting to use it without any prior knowledge as it can starts learning the structure and parameters automatically from data. We test this predictive system in different real-work scenarios, and evaluate its performance integrating a multi-task learning paradigm for the sake of the traffic prediction task.Peer ReviewedPostprint (published version

    A modified Learn++.NSE algorithm for dealing with concept drift

    Full text link
    © 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. Concept drift is a very pervasive phenomenon in real world applications. By virtue of variety change types of concept drift, it makes more difficult for learning algorithm to track the concept drift very closely. Learn++.NSE is an incremental ensemble learner without any assumption on change type of concept drift. Even though it has good performance on handling concept drift, but it costs high computation and needs more time to recover from accuracy drop. This paper proposed a modified Learn++.NSE algorithm. During learning instances in data stream, our algorithm first identifies where and when drift happened, then uses instances accumulated by drift detection method to create a new base classifier, and finally organized all existing classifiers based on Learn++.NSE weighting mechanism to update ensemble learner. This modified algorithm can reduce high computation cost without any performance drop and improve the accuracy recover speed when drift happened

    Handling Concept Drift for Predictions in Business Process Mining

    Get PDF
    Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies
    corecore