68 research outputs found

    Transfer Learning in Non-Stationary Environments

    Get PDF

    Incremental learning of concept drift from imbalanced data

    Get PDF
    Learning data sampled from a nonstationary distribution has been shown to be a very challenging problem in machine learning, because the joint probability distribution between the data and classes evolve over time. Thus learners must adapt their knowledge base, including their structure or parameters, to remain as strong predictors. This phenomenon of learning from an evolving data source is akin to learning how to play a game while the rules of the game are changed, and it is traditionally referred to as learning concept drift. Climate data, financial data, epidemiological data, spam detection are examples of applications that give rise to concept drift problems. An additional challenge arises when the classes to be learned are not represented (approximately) equally in the training data, as most machine learning algorithms work well only when the class distributions are balanced. However, rare categories are commonly faced in real-world applications, which leads to skewed or imbalanced datasets. Fraud detection, rare disease diagnosis, anomaly detection are examples of applications that feature imbalanced datasets, where data from category are severely underrepresented. Concept drift and class imbalance are traditionally addressed separately in machine learning, yet data streams can experience both phenomena. This work introduces Learn++.NIE (nonstationary & imbalanced environments) and Learn++.CDS (concept drift with SMOTE) as two new members of the Learn++ family of incremental learning algorithms that explicitly and simultaneously address the aforementioned phenomena. The former addresses concept drift and class imbalance through modified bagging-based sampling and replacing a class independent error weighting mechanism - which normally favors majority class - with a set of measures that emphasize good predictive accuracy on all classes. The latter integrates Learn++.NSE, an algorithm for concept drift, with the synthetic sampling method known as SMOTE, to cope with class imbalance. This research also includes a thorough evaluation of Learn++.CDS and Learn++.NIE on several real and synthetic datasets and on several figures of merit, showing that both algorithms are able to learn in some of the most difficult learning environments

    Covariate shift detection-based nonstationary adaptation in motor-imagery-based brain–computer interface

    Get PDF
    Nonstationary learning refers to the process that can learn patterns from data, adapt to shifts, and improve performance of the system with its experience while operating in the nonstationary environments (NSEs). Covariate shift (CS) presents a major challenge during data processing within NSEs wherein the input-data distribution shifts during transitioning from training to testing phase. CS is one of the fundamental issues in electroencephalogram (EEG)-based brain-computer interface (BCI) systems and can be often observed during multiple trials of EEG data recorded over different sessions. Thus, conventional learning algorithms struggle to accommodate these CSs in streaming EEG data resulting in low performance (in terms of classification accuracy) of motor imagery (MI)-related BCI systems. This chapter aims to introduce a novel framework for nonstationary adaptation in MI-related BCI system based on CS detection applied to the temporal and spatial filtered features extracted from raw EEG signals. The chapter collectively provides an efficient method for accounting nonstationarity in EEG data during learning in NSEs

    A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

    Full text link
    Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leverage the unlabelled data (semi-supervised learning); or assume some labels will be available on request (active learning). The first approach is the simplest, yet the amount of labelled data available will limit the predictive performance. The second relies on finding and exploiting the underlying characteristics of the data distribution. The third depends on an external agent to provide the required labels in a timely fashion. This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting. We also discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods. We propose a unified problem setting, discuss the learning guarantees and existing methods, explain the differences between related problem settings. Finally, we review the current benchmarking practices and propose adaptations to enhance them

    Semi-Supervised Learning for Diagnosing Faults in Electromechanical Systems

    Get PDF
    Safe and reliable operation of the systems relies on the use of online condition monitoring and diagnostic systems that aim to take immediate actions upon the occurrence of a fault. Machine learning techniques are widely used for designing data-driven diagnostic models. The training procedure of a data-driven model usually requires a large amount of labeled data, which may not be always practical. This problem can be untangled by resorting to semi-supervised learning approaches, which enables the decision making procedure using only a few numbers of labeled samples coupled with a large number of unlabeled samples. Thus, it is crucial to conduct a critical study on the use of semi-supervised learning for the purpose of fault diagnosis. Another issue of concern is fault diagnosis in non-stationary environments, where data streams evolve over time, and as a result, model-based and most of the data-driven models are impractical. In this work, this has been addressed by means of an adaptive data-driven diagnostic model
    • …
    corecore