12,158 research outputs found

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    A survey on machine learning for recurring concept drifting data streams

    Get PDF
    The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

    Predicting recurring concepts on data-streams by me ans of a meta-model and a fuzzy similarity function

    Get PDF
    Meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems(IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. In this paper we present MM-PRec, a meta-model for predicting recurring concepts on data-streams which main goal is to predict when the drift is going to occur together with the best model to be used in case of a recurring concept. To fulfill this goal, MM-PRec trains a Hidden Markov Model (HMM) from the instances that appear during the concept drift. The learning process of the base classification learner feeds the meta-model with all the information needed to predict recurrent or similar situations. Thus, the models predicted together with the associated contextual information are stored. In our approach we also propose to use a fuzzy similarity function to decide which is the best model to represent a particular context when drift is detected. The experiments performed show that MM-PRec outperforms the behaviour of other context-aware algorithms in terms of training instances needed, specially in environments characterized by the presence of gradual drifts

    Adaptive Algorithms For Classification On High-Frequency Data Streams: Application To Finance

    Get PDF
    Mención Internacional en el título de doctorIn recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the nonstationary nature and the likelihood of drastic structural changes in financial markets. The most recent literature suggests the use of conventional machine learning and statistical approaches for this. However, these techniques are unable or slow to adapt to non-stationarities and may require re-training over time, which is computationally expensive and brings financial risks. This thesis proposes a set of adaptive algorithms to deal with high-frequency data streams and applies these to the financial domain. We present approaches to handle different types of concept drifts and perform predictions using up-to-date models. These mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The core experiments of this thesis are based on the prediction of the price movement direction at different intraday resolutions in the SPDR S&P 500 exchange-traded fund. The proposed algorithms are benchmarked against other popular methods from the data stream mining literature and achieve competitive results. We believe that this thesis opens good research prospects for financial forecasting during market instability and structural breaks. Results have shown that our proposed methods can improve prediction accuracy in many of these scenarios. Indeed, the results obtained are compatible with ideas against the efficient market hypothesis. However, we cannot claim that we can beat consistently buy and hold; therefore, we cannot reject it.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Gustavo Recio Isasi.- Secretario: Pedro Isasi Viñuela.- Vocal: Sandra García Rodrígue

    An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams

    Full text link
    Existing FNNs are mostly developed under a shallow network configuration having lower generalization power than those of deep structures. This paper proposes a novel self-organizing deep FNN, namely DEVFNN. Fuzzy rules can be automatically extracted from data streams or removed if they play limited role during their lifespan. The structure of the network can be deepened on demand by stacking additional layers using a drift detection method which not only detects the covariate drift, variations of input space, but also accurately identifies the real drift, dynamic changes of both feature space and target space. DEVFNN is developed under the stacked generalization principle via the feature augmentation concept where a recently developed algorithm, namely gClass, drives the hidden layer. It is equipped by an automatic feature selection method which controls activation and deactivation of input attributes to induce varying subsets of input features. A deep network simplification procedure is put forward using the concept of hidden layer merging to prevent uncontrollable growth of dimensionality of input space due to the nature of feature augmentation approach in building a deep network structure. DEVFNN works in the sample-wise fashion and is compatible for data stream applications. The efficacy of DEVFNN has been thoroughly evaluated using seven datasets with non-stationary properties under the prequential test-then-train protocol. It has been compared with four popular continual learning algorithms and its shallow counterpart where DEVFNN demonstrates improvement of classification accuracy. Moreover, it is also shown that the concept drift detection method is an effective tool to control the depth of network structure while the hidden layer merging scenario is capable of simplifying the network complexity of a deep network with negligible compromise of generalization performance.Comment: This paper has been published in IEEE Transactions on Fuzzy System

    Machine Learning for Financial Prediction Under Regime Change Using Technical Analysis: A Systematic Review

    Get PDF
    Recent crises, recessions and bubbles have stressed the non-stationary nature and the presence of drastic structural changes in the financial domain. The most recent literature suggests the use of conventional machine learning and statistical approaches in this context. Unfortunately, several of these techniques are unable or slow to adapt to changes in the price-generation process. This study aims to survey the relevant literature on Machine Learning for financial prediction under regime change employing a systematic approach. It reviews key papers with a special emphasis on technical analysis. The study discusses the growing number of contributions that are bridging the gap between two separate communities, one focused on data stream learning and the other on economic research. However, it also makes apparent that we are still in an early stage. The range of machine learning algorithms that have been tested in this domain is very wide, but the results of the study do not suggest that currently there is a specific technique that is clearly dominant

    Incremental Market Behavior Classification in Presence of Recurring Concepts

    Get PDF
    In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant number ENE2014-56126-C2-2-R
    corecore