864 research outputs found

    An ensemble-based computational approach for incremental learning in non-stationary environments related to schema- and scaffolding-based human learning

    Get PDF
    The principal dilemma in a learning process, whether human or computer, is adapting to new information, especially in cases where this new information conflicts with what was previously learned. The design of computer models for incremental learning is an emerging topic for classification and prediction of large-scale data streams undergoing change in underlying class distributions (definitions) over time; yet currently, they often ignore significant foundational learning theory that has been developed in the domain of human learning. This shortfall leads to many deficiencies in the ability to organize existing knowledge and to retain relevant knowledge for long periods of time. In this work, we introduce a unique computer-learning algorithm for incremental knowledge acquisition using an ensemble of classifiers, Learn++.NSE (Non-Stationary Environments), specifically for the case where the nature of knowledge to be learned is evolving. Learn++.NSE is a novel approach to evaluating and organizing existing knowledge (classifiers) according to the most recent data environment. Under this architecture, we address the learning problem at both the learner and supervisor end, discussing and implementing three main approaches: knowledge weighting/organization, forgetting prior knowledge, and change/drift detection. The framework is evaluated on a variety of canonical and real-world data streams (weather prediction, electricity price prediction, and spam detection). This study reveals the catastrophic effect of forgetting prior knowledge, supporting the organization technique proposed by Learn++.NSE as the most consistent performer during various drift scenarios, while also addressing the sheer difficulty in designing a system that strikes a balance between maintaining all knowledge and making decisions based only on relevant knowledge, especially in severe, unpredictable environments which are often encountered in the real-world

    A survey on machine learning for recurring concept drifting data streams

    Get PDF
    The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

    A Survey on Concept Drift Adaptation

    Get PDF
    Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

    Adaptive Online Sequential ELM for Concept Drift Tackling

    Get PDF
    A machine learning method needs to adapt to over time changes in the environment. Such changes are known as concept drift. In this paper, we propose concept drift tackling method as an enhancement of Online Sequential Extreme Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by adding adaptive capability for classification and regression problem. The scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme that works well to handle real drift, virtual drift, and hybrid drift. The AOS-ELM also works well for sudden drift and recurrent context change type. The scheme is a simple unified method implemented in simple lines of code. We evaluated AOS-ELM on regression and classification problem by using concept drift public data set (SEA and STAGGER) and other public data sets such as MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice does not need hidden nodes increase, we address some issues related to the increasing of the hidden nodes such as error condition and rank values. We propose taking the rank of the pseudoinverse matrix as an indicator parameter to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016, Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering Applications". Academic Editor: Stefan Hauf

    Dynamic Data Mining: Methodology and Algorithms

    No full text
    Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

    Concept Drift Detection in Data Stream Mining: The Review of Contemporary Literature

    Get PDF
    Mining process such as classification, clustering of progressive or dynamic data is a critical objective of the information retrieval and knowledge discovery; in particular, it is more sensitive in data stream mining models due to the possibility of significant change in the type and dimensionality of the data over a period. The influence of these changes over the mining process termed as concept drift. The concept drift that depict often in streaming data causes unbalanced performance of the mining models adapted. Hence, it is obvious to boost the mining models to predict and analyse the concept drift to achieve the performance at par best. The contemporary literature evinced significant contributions to handle the concept drift, which fall in to supervised, unsupervised learning, and statistical assessment approaches. This manuscript contributes the detailed review of the contemporary concept-drift detection models depicted in recent literature. The contribution of the manuscript includes the nomenclature of the concept drift models and their impact of imbalanced data tuples
    • …
    corecore