125,107 research outputs found
A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios
Class imbalance and class overlap are two of the major problems in data mining and machine learning. Several studies have shown that these data complexities may affect the performance or behavior of artificial neural networks. Strategies proposed to face with both challenges have been separately applied. In this paper, we introduce a hybrid method for handling both class imbalance and class overlap simultaneously in multi-class learning problems. Experimental results on five remote sensing data show that the combined approach is a promising method
Active Learning for One-Class Classification Using Two One-Class Classifiers
This paper introduces a novel, generic active learning method for one-class
classification. Active learning methods play an important role to reduce the
efforts of manual labeling in the field of machine learning. Although many
active learning approaches have been proposed during the last years, most of
them are restricted on binary or multi-class problems. One-class classifiers
use samples from only one class, the so-called target class, during training
and hence require special active learning strategies. The few strategies
proposed for one-class classification either suffer from their limitation on
specific one-class classifiers or their performance depends on particular
assumptions about datasets like imbalance. Our proposed method bases on using
two one-class classifiers, one for the desired target class and one for the
so-called outlier class. It allows to invent new query strategies, to use
binary query strategies and to define simple stopping criteria. Based on the
new method, two query strategies are proposed. The provided experiments compare
the proposed approach with known strategies on various datasets and show
improved results in almost all situations.Comment: EUSIPCO 201
Empowering One-vs-One Decomposition with Ensemble Learning for Multi-Class Imbalanced Data
Zhongliang Zhang was supported by the National Science Foundation of China (NSFC Proj. 61273204) and CSC Scholarship Program (CSC NO. 201406080059).
Bartosz Krawczyk was supported by the Polish National Science Center under the grant no. UMO-2015/19/B/ST6/01597.
Salvador Garcia and Francisco Herrera were partially supported by the Spanish Ministry of Education and Science under Project TIN2014-57251-P and the Andalusian Research Plan P10-TIC-6858, P11-TIC-7765.
Alejandro Rosales-Perez was supported by the CONACyT grant 329013.Multi-class imbalance classification problems occur in many real-world applications, which suffer from the quite different distribution of classes. Decomposition strategies are well-known techniques to address the classification problems involving multiple classes. Among them binary approaches using one-vs-one and one-vs-all has gained a significant attention from the research community. They allow to divide multi-class problems into several easier-to-solve two-class sub-problems. In this study we develop an exhaustive empirical analysis to explore the possibility of empowering the one-vs-one scheme for multi-class imbalance classification problems with applying binary ensemble learning approaches. We examine several state-of-the-art ensemble learning methods proposed for addressing the imbalance problems to solve the pairwise tasks derived from the multi-class data set. Then the aggregation strategy is employed to combine the binary ensemble outputs to reconstruct the original multi-class task. We present a detailed experimental study of the proposed approach, supported by the statistical analysis. The results indicate the high effectiveness of ensemble learning with one-vs-one scheme in dealing with the multi-class imbalance classification problems.National Natural Science Foundation of China (NSFC)
61273204CSC Scholarship Program (CSC)
201406080059Polish National Science Center
UMO-2015/19/B/ST6/01597Spanish Government
TIN2014-57251-PAndalusian Research Plan
P10-TIC-6858
P11-TIC-7765Consejo Nacional de Ciencia y Tecnologia (CONACyT)
32901
A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation
Class imbalance (CI) in classification problems arises when the number of
observations belonging to one class is lower than the other. Ensemble learning
combines multiple models to obtain a robust model and has been prominently used
with data augmentation methods to address class imbalance problems. In the last
decade, a number of strategies have been added to enhance ensemble learning and
data augmentation methods, along with new methods such as generative
adversarial networks (GANs). A combination of these has been applied in many
studies, and the evaluation of different combinations would enable a better
understanding and guidance for different application domains. In this paper, we
present a computational study to evaluate data augmentation and ensemble
learning methods used to address prominent benchmark CI problems. We present a
general framework that evaluates 9 data augmentation and 9 ensemble learning
methods for CI problems. Our objective is to identify the most effective
combination for improving classification performance on imbalanced datasets.
The results indicate that combinations of data augmentation methods with
ensemble learning can significantly improve classification performance on
imbalanced datasets. We find that traditional data augmentation methods such as
the synthetic minority oversampling technique (SMOTE) and random oversampling
(ROS) are not only better in performance for selected CI problems, but also
computationally less expensive than GANs. Our study is vital for the
development of novel models for handling imbalanced datasets
Predicting Customer Retention of an App-Based Business Using Supervised Machine Learning
Identification of retainable customers is very essential for the functioning and growth of any business. An effective identification of retainable customers can help the business to identify the reasons of retention and plan their marketing strategies accordingly. This research is aimed at developing a machine learning model that can precisely predict the retainable customers from the total customer data of an e-learning business. Building predictive models that can efficiently classify imbalanced data is a major challenge in data mining and machine learning. Most of the machine learning algorithms deliver a suboptimal performance when introduced to an imbalanced dataset. A variety of algorithm level (cost sensitive learning, one class learning, ensemble methods ) and data level methods (sampling, feature selection) are widely used to address the class imbalance in the retention prediction problems. This research employs a quantitative and inductive approach to build a supervised machine learning model that addresses the class imbalance problem and efficiently predict the customer retention. The retention Precision is used as the evaluation metrics for this research. The research evaluates the performance of different sampling methods (Random Under – Sampling, Random Over – Sampling, SMOTE) on different single and ensemble machine learning models. The results show that Random Under-Sampling used along with XGBoost classifier yields the best precision in identifying the retention class. The best model evolved in the research was also used to predict retainable customers from the recent unknown customer data, and could attain a retention precision of 57.5%
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
Deep-learning has proved in recent years to be a powerful tool for image
analysis and is now widely used to segment both 2D and 3D medical images.
Deep-learning segmentation frameworks rely not only on the choice of network
architecture but also on the choice of loss function. When the segmentation
process targets rare observations, a severe class imbalance is likely to occur
between candidate labels, thus resulting in sub-optimal performance. In order
to mitigate this issue, strategies such as the weighted cross-entropy function,
the sensitivity function or the Dice loss function, have been proposed. In this
work, we investigate the behavior of these loss functions and their sensitivity
to learning rate tuning in the presence of different rates of label imbalance
across 2D and 3D segmentation tasks. We also propose to use the class
re-balancing properties of the Generalized Dice overlap, a known metric for
segmentation assessment, as a robust and accurate deep-learning loss function
for unbalanced tasks
- …