7,419 research outputs found

    Informative sample generation using class aware generative adversarial networks for classification of chest Xrays

    Full text link
    Training robust deep learning (DL) systems for disease detection from medical images is challenging due to limited images covering different disease types and severity. The problem is especially acute, where there is a severe class imbalance. We propose an active learning (AL) framework to select most informative samples for training our model using a Bayesian neural network. Informative samples are then used within a novel class aware generative adversarial network (CAGAN) to generate realistic chest xray images for data augmentation by transferring characteristics from one class label to another. Experiments show our proposed AL framework is able to achieve state-of-the-art performance by using about 35%35\% of the full dataset, thus saving significant time and effort over conventional methods

    Incremental learning of concept drift from imbalanced data

    Get PDF
    Learning data sampled from a nonstationary distribution has been shown to be a very challenging problem in machine learning, because the joint probability distribution between the data and classes evolve over time. Thus learners must adapt their knowledge base, including their structure or parameters, to remain as strong predictors. This phenomenon of learning from an evolving data source is akin to learning how to play a game while the rules of the game are changed, and it is traditionally referred to as learning concept drift. Climate data, financial data, epidemiological data, spam detection are examples of applications that give rise to concept drift problems. An additional challenge arises when the classes to be learned are not represented (approximately) equally in the training data, as most machine learning algorithms work well only when the class distributions are balanced. However, rare categories are commonly faced in real-world applications, which leads to skewed or imbalanced datasets. Fraud detection, rare disease diagnosis, anomaly detection are examples of applications that feature imbalanced datasets, where data from category are severely underrepresented. Concept drift and class imbalance are traditionally addressed separately in machine learning, yet data streams can experience both phenomena. This work introduces Learn++.NIE (nonstationary & imbalanced environments) and Learn++.CDS (concept drift with SMOTE) as two new members of the Learn++ family of incremental learning algorithms that explicitly and simultaneously address the aforementioned phenomena. The former addresses concept drift and class imbalance through modified bagging-based sampling and replacing a class independent error weighting mechanism - which normally favors majority class - with a set of measures that emphasize good predictive accuracy on all classes. The latter integrates Learn++.NSE, an algorithm for concept drift, with the synthetic sampling method known as SMOTE, to cope with class imbalance. This research also includes a thorough evaluation of Learn++.CDS and Learn++.NIE on several real and synthetic datasets and on several figures of merit, showing that both algorithms are able to learn in some of the most difficult learning environments

    A Survey of Methods for Handling Disk Data Imbalance

    Full text link
    Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs
    • …
    corecore