383 research outputs found

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    Online Tool Condition Monitoring Based on Parsimonious Ensemble+

    Full text link
    Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

    A randomized neural network for data streams

    Get PDF
    © 2017 IEEE. Randomized neural network (RNN) is a highly feasible solution in the era of big data because it offers a simple and fast working principle in processing dynamic and evolving data streams. This paper proposes a novel RNN, namely recurrent type-2 random vector functional link network (RT2McRVFLN), which provides a highly scalable solution for data streams in a strictly online and integrated framework. It is built upon the psychologically inspired concept of metacognitive learning, which covers three basic components of human learning: what-to-learn, how-to-learn, and when-to-learn. The what-to-learn selects important samples on the fly with the use of online active learning scenario, which renders our algorithm an online semi-supervised algorithm. The how-to-learn process combines an open structure of evolving concept and a randomized learning algorithm of random vector functional link network (RVFLN). The efficacy of the RT2McRVFLN has been numerically validated through two real-world case studies and comparisons with its counterparts, which arrive at a conclusive finding that our algorithm delivers a tradeoff between accuracy and simplicity

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Quantifying the reliability of fault classifiers

    No full text
    International audienceFault diagnostics problems can be formulated as classification tasks. Due to limited data and to uncertainty, classification algorithms are not perfectly accurate in practical applications. Maintenance decisions based on erroneous fault classifications result in inefficient resource allocations and/or operational disturbances. Thus, knowing the accuracy of classifiers is important to give confidence in the maintenance decisions. The average accuracy of a classifier on a test set of data patterns is often used as a measure of confidence in the performance of a specific classifier. However, the performance of a classifier can vary in different regions of the input data space. Several techniques have been proposed to quantify the reliability of a classifier at the level of individual classifications. Many of the proposed techniques are only applicable to specific classifiers, such as ensemble techniques and support vector machines. In this paper, we propose a meta approach based on the typicalness framework (Kolmogorov's concept of randomness), which is independent of the applied classifier. We apply the approach to a case of fault diagnosis in railway turnout systems and compare the results obtained with both extreme learning machines and echo state networks

    On pruning and feature engineering in Random Forests.

    Get PDF
    Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, demonstrating the potential of the proposed methods. Performance-wise, CLUB-DRF is ranked first in terms of accuracy and classifcation speed making it ideal for real-time applications, and for machines/devices with limited memory and processing power
    corecore