752 research outputs found
Vote-boosting ensembles
Vote-boosting is a sequential ensemble learning method in which the
individual classifiers are built on different weighted versions of the training
data. To build a new classifier, the weight of each training instance is
determined in terms of the degree of disagreement among the current ensemble
predictions for that instance. For low class-label noise levels, especially
when simple base learners are used, emphasis should be made on instances for
which the disagreement rate is high. When more flexible classifiers are used
and as the noise level increases, the emphasis on these uncertain instances
should be reduced. In fact, at sufficiently high levels of class-label noise,
the focus should be on instances on which the ensemble classifiers agree. The
optimal type of emphasis can be automatically determined using
cross-validation. An extensive empirical analysis using the beta distribution
as emphasis function illustrates that vote-boosting is an effective method to
generate ensembles that are both accurate and robust
Boosting ensembles with controlled emphasis intensity
Boosting ensembles have deserved much attention because their high performance. But they are also sensitive to adverse conditions, such as noisy environments or the presence of outliers. A way to fight against their degradation is to modify the forms of the emphasis weighting which is applied to train each new learner. In this paper, we propose to use a general form for that emphasis function, which not only includes an error dependent and a proximity to the classification boundary dependent term, but also a constant value which serves to control how much emphasis is applied. Two convex combinations are used to consider these terms, and this makes possible to control their relative influence. Experimental results support the effectiveness of this general form of boosting emphasis.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845,DGUI-CM), and Macro-ADOBE ( TEC2015- 67719-P, MINECO )
Class imbalance ensemble learning based on the margin theory
The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning
- âŠ