1,255 research outputs found
Mining recurrent concepts in data streams using the discrete Fourier transform
In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the Discrete Fourier Transform (DFT) to Decision Tree classifiers to obtain highly compressed versions of the trees at concept drift points in the stream and store such trees in a repository for future use. Our empirical results on real world and synthetic data exhibiting varying degrees of recurrence show that the Fourier compressed trees are more robust to noise and are able to capture recurring concepts with higher precision than a meta learning approach that chooses to re-use classifiers in their originally occurring form
Use of Ensembles of Fourier Spectra in Capturing Recurrent Concepts in Data Streams
In this research, we apply ensembles of Fourier encoded spectra to capture and mine recurring concepts in a data stream environment. Previous research showed that compact versions of Decision Trees can be obtained by applying the Discrete Fourier Transform to accurately capture recurrent concepts in a data stream. However, in highly volatile environments where new concepts emerge often, the approach of encoding each concept in a separate spectrum is no longer viable due to memory overload and thus in this research we present an ensemble approach that addresses this problem. Our empirical results on real world data and synthetic data exhibiting varying degrees of recurrence reveal that the ensemble approach outperforms the single spectrum approach in terms of classification accuracy, memory and execution time
Mining recurring concepts in a dynamic feature space
Most data stream classification techniques assume that the underlying feature space is static. However, in real-world
applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results
demonstrating the high accuracy of MReC-DFS compared with
state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS
A survey on machine learning for recurring concept drifting data streams
The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time
Cascading Randomized Weighted Majority: A New Online Ensemble Learning Algorithm
With the increasing volume of data in the world, the best approach for
learning from this data is to exploit an online learning algorithm. Online
ensemble methods are online algorithms which take advantage of an ensemble of
classifiers to predict labels of data. Prediction with expert advice is a
well-studied problem in the online ensemble learning literature. The Weighted
Majority algorithm and the randomized weighted majority (RWM) are the most
well-known solutions to this problem, aiming to converge to the best expert.
Since among some expert, the best one does not necessarily have the minimum
error in all regions of data space, defining specific regions and converging to
the best expert in each of these regions will lead to a better result. In this
paper, we aim to resolve this defect of RWM algorithms by proposing a novel
online ensemble algorithm to the problem of prediction with expert advice. We
propose a cascading version of RWM to achieve not only better experimental
results but also a better error bound for sufficiently large datasets.Comment: 15 pages, 3 figure
Predicting recurring concepts on data-streams by me ans of a meta-model and a fuzzy similarity function
Meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems(IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. In this paper we present MM-PRec, a meta-model for predicting recurring concepts on data-streams which main goal is to predict when the drift is going to occur together with the best model to be used in case of a recurring concept. To fulfill this goal, MM-PRec trains a Hidden Markov Model (HMM) from the instances that appear during the concept drift. The learning process of the base classification learner feeds the meta-model with all the information needed to predict recurrent or similar situations. Thus, the models predicted together with the associated contextual information are stored. In our approach we also propose to use a fuzzy similarity function to decide which is the best model to represent a particular context when drift is detected. The experiments performed show that MM-PRec outperforms the behaviour of other context-aware algorithms in terms of training instances needed, specially in environments characterized by the presence of gradual drifts
Incremental Market Behavior Classification in Presence of Recurring Concepts
In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or
slow to adapt to these changes. Ensemble-based systems are widely known for their good results
predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF
(Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles
recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest
for evolving data streams, adding on top a mechanism to store and handle a shared collection of
inactive trees, called concept history, which holds memories of the way market operators reacted
in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by
replacing active trees with the best available alternative: either a previously stored tree from the
concept history or a newly trained background tree. Both mechanisms are designed to provide fast
reaction times and are thus applicable to high-frequency data. The experimental validation of the
algorithm is based on the prediction of price movement directions one second ahead in the SPDR
(Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked
against other popular methods from the incremental online machine learning literature and is able to
achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant
number ENE2014-56126-C2-2-R
Adaptive Algorithms For Classification On High-Frequency Data Streams: Application To Finance
Mención Internacional en el título de doctorIn recent years, the problem of concept drift has gained importance in the financial
domain. The succession of manias, panics and crashes have stressed the nonstationary
nature and the likelihood of drastic structural changes in financial markets.
The most recent literature suggests the use of conventional machine learning and statistical
approaches for this. However, these techniques are unable or slow to adapt
to non-stationarities and may require re-training over time, which is computationally
expensive and brings financial risks.
This thesis proposes a set of adaptive algorithms to deal with high-frequency data
streams and applies these to the financial domain. We present approaches to handle
different types of concept drifts and perform predictions using up-to-date models.
These mechanisms are designed to provide fast reaction times and are thus applicable
to high-frequency data. The core experiments of this thesis are based on the prediction
of the price movement direction at different intraday resolutions in the SPDR S&P 500
exchange-traded fund. The proposed algorithms are benchmarked against other popular
methods from the data stream mining literature and achieve competitive results.
We believe that this thesis opens good research prospects for financial forecasting
during market instability and structural breaks. Results have shown that our proposed
methods can improve prediction accuracy in many of these scenarios. Indeed, the
results obtained are compatible with ideas against the efficient market hypothesis.
However, we cannot claim that we can beat consistently buy and hold; therefore, we
cannot reject it.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Gustavo Recio Isasi.- Secretario: Pedro Isasi Viñuela.- Vocal: Sandra García Rodrígue
- …