20 research outputs found
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
We address the problem of detecting changes in multivariate datastreams, and
we investigate the intrinsic difficulty that change-detection methods have to
face when the data dimension scales. In particular, we consider a general
approach where changes are detected by comparing the distribution of the
log-likelihood of the datastream over different time windows. Despite the fact
that this approach constitutes the frame of several change-detection methods,
its effectiveness when data dimension scales has never been investigated, which
is indeed the goal of our paper. We show that the magnitude of the change can
be naturally measured by the symmetric Kullback-Leibler divergence between the
pre- and post-change distributions, and that the detectability of a change of a
given magnitude worsens when the data dimension increases. This problem, which
we refer to as \emph{detectability loss}, is due to the linear relationship
between the variance of the log-likelihood and the data dimension. We
analytically derive the detectability loss on Gaussian-distributed datastreams,
and empirically demonstrate that this problem holds also on real-world datasets
and that can be harmful even at low data-dimensions (say, 10)
Adaptive Online Sequential ELM for Concept Drift Tackling
A machine learning method needs to adapt to over time changes in the
environment. Such changes are known as concept drift. In this paper, we propose
concept drift tackling method as an enhancement of Online Sequential Extreme
Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by
adding adaptive capability for classification and regression problem. The
scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme
that works well to handle real drift, virtual drift, and hybrid drift. The
AOS-ELM also works well for sudden drift and recurrent context change type. The
scheme is a simple unified method implemented in simple lines of code. We
evaluated AOS-ELM on regression and classification problem by using concept
drift public data set (SEA and STAGGER) and other public data sets such as
MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value
compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice
does not need hidden nodes increase, we address some issues related to the
increasing of the hidden nodes such as error condition and rank values. We
propose taking the rank of the pseudoinverse matrix as an indicator parameter
to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience
Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016,
Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and
Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering
Applications". Academic Editor: Stefan Hauf
Accumulating regional density dissimilarity for concept drift detection in data streams
© 2017 Elsevier Ltd In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model's performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete data instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI's accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection
Mining recurrent concepts in data streams using the discrete Fourier transform
In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the Discrete Fourier Transform (DFT) to Decision Tree classifiers to obtain highly compressed versions of the trees at concept drift points in the stream and store such trees in a repository for future use. Our empirical results on real world and synthetic data exhibiting varying degrees of recurrence show that the Fourier compressed trees are more robust to noise and are able to capture recurring concepts with higher precision than a meta learning approach that chooses to re-use classifiers in their originally occurring form
Incremental Market Behavior Classification in Presence of Recurring Concepts
In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or
slow to adapt to these changes. Ensemble-based systems are widely known for their good results
predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF
(Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles
recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest
for evolving data streams, adding on top a mechanism to store and handle a shared collection of
inactive trees, called concept history, which holds memories of the way market operators reacted
in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by
replacing active trees with the best available alternative: either a previously stored tree from the
concept history or a newly trained background tree. Both mechanisms are designed to provide fast
reaction times and are thus applicable to high-frequency data. The experimental validation of the
algorithm is based on the prediction of price movement directions one second ahead in the SPDR
(Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked
against other popular methods from the incremental online machine learning literature and is able to
achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant
number ENE2014-56126-C2-2-R
Learning Discrete-Time Markov Chains Under Concept Drift
Learning under concept drift is a novel and promising research area aiming at designing learning algorithms able to deal with nonstationary data-generating processes. In this research field, most of the literature focuses on learning nonstationary probabilistic frameworks, while some extensions about learning graphs and signals under concept drift exist. For the first time in the literature, this paper addresses the problem of learning discrete-time Markov chains (DTMCs) under concept drift. More specifically, following a hybrid active/passive approach, this paper introduces both a family of change-detection mechanisms (CDMs), differing in the required assumptions and performance, for detecting changes in DTMCs and an adaptive learning algorithm able to deal with DTMCs under concept drift. The effectiveness of both the proposed CDMs and the adaptive learning algorithm has been extensively tested on synthetically generated experiments and real data sets
Machine Learning for Financial Prediction Under Regime Change Using Technical Analysis: A Systematic Review
Recent crises, recessions and bubbles have stressed the non-stationary nature and the presence of drastic structural changes in the financial domain. The most recent literature suggests the use of conventional machine learning and statistical approaches in this context. Unfortunately, several of these techniques are unable or slow to adapt to changes in the price-generation process. This study aims to survey the relevant literature on Machine Learning for financial prediction under regime change employing a systematic approach.
It reviews key papers with a special emphasis on technical analysis. The study discusses the growing number of contributions that are bridging the gap between two separate communities, one focused on data stream learning and the other on economic research. However, it also makes apparent that we are still in an early stage. The range of machine learning algorithms that have been tested in this domain is very wide, but the results of the study do not suggest that currently there is a specific technique that is clearly dominant