5,979 research outputs found
Adaptive Ranking Based Constraint Handling for Explicitly Constrained Black-Box Optimization
A novel explicit constraint handling technique for the covariance matrix
adaptation evolution strategy (CMA-ES) is proposed. The proposed constraint
handling exhibits two invariance properties. One is the invariance to arbitrary
element-wise increasing transformation of the objective and constraint
functions. The other is the invariance to arbitrary affine transformation of
the search space. The proposed technique virtually transforms a constrained
optimization problem into an unconstrained optimization problem by considering
an adaptive weighted sum of the ranking of the objective function values and
the ranking of the constraint violations that are measured by the Mahalanobis
distance between each candidate solution to its projection onto the boundary of
the constraints. Simulation results are presented and show that the CMA-ES with
the proposed constraint handling exhibits the affine invariance and performs
similarly to the CMA-ES on unconstrained counterparts.Comment: 9 page
Empowering One-vs-One Decomposition with Ensemble Learning for Multi-Class Imbalanced Data
Zhongliang Zhang was supported by the National Science Foundation of China (NSFC Proj. 61273204) and CSC Scholarship Program (CSC NO. 201406080059).
Bartosz Krawczyk was supported by the Polish National Science Center under the grant no. UMO-2015/19/B/ST6/01597.
Salvador Garcia and Francisco Herrera were partially supported by the Spanish Ministry of Education and Science under Project TIN2014-57251-P and the Andalusian Research Plan P10-TIC-6858, P11-TIC-7765.
Alejandro Rosales-Perez was supported by the CONACyT grant 329013.Multi-class imbalance classification problems occur in many real-world applications, which suffer from the quite different distribution of classes. Decomposition strategies are well-known techniques to address the classification problems involving multiple classes. Among them binary approaches using one-vs-one and one-vs-all has gained a significant attention from the research community. They allow to divide multi-class problems into several easier-to-solve two-class sub-problems. In this study we develop an exhaustive empirical analysis to explore the possibility of empowering the one-vs-one scheme for multi-class imbalance classification problems with applying binary ensemble learning approaches. We examine several state-of-the-art ensemble learning methods proposed for addressing the imbalance problems to solve the pairwise tasks derived from the multi-class data set. Then the aggregation strategy is employed to combine the binary ensemble outputs to reconstruct the original multi-class task. We present a detailed experimental study of the proposed approach, supported by the statistical analysis. The results indicate the high effectiveness of ensemble learning with one-vs-one scheme in dealing with the multi-class imbalance classification problems.National Natural Science Foundation of China (NSFC)
61273204CSC Scholarship Program (CSC)
201406080059Polish National Science Center
UMO-2015/19/B/ST6/01597Spanish Government
TIN2014-57251-PAndalusian Research Plan
P10-TIC-6858
P11-TIC-7765Consejo Nacional de Ciencia y Tecnologia (CONACyT)
32901
Class imbalance ensemble learning based on the margin theory
The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning
An Oversampling Mechanism for Multimajority Datasets using SMOTE and Darwinian Particle Swarm Optimisation
Data skewness continues to be one of the leading factors which adversely impacts the machine learning algorithms performance. An approach to reduce this negative effect of the data variance is to pre-process the former dataset with data level resampling strategies. Resampling strategies have been seen in two forms, oversampling and undersampling. An oversampling strategy is proposed in this article for tackling multiclass imbalanced datasets. This proposed approach optimises the state-of-the-art oversampling technique SMOTE with the Darwinian Particle Swarm Optimization technique. This proposed method DOSMOTE generates synthetic optimised samples for balancing the datasets. This strategy will be more effective on multimajority datasets. An experimental study is performed on peculiar multimajority datasets to measure the effectiveness of the proposed approach. As a result, the proposed method produces promising results when compared to the conventional oversampling strategies
Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics
The Random Forest (RF) algorithm by Leo Breiman has become a
standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research
A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios
Class imbalance and class overlap are two of the major problems in data mining and machine learning. Several studies have shown that these data complexities may affect the performance or behavior of artificial neural networks. Strategies proposed to face with both challenges have been separately applied. In this paper, we introduce a hybrid method for handling both class imbalance and class overlap simultaneously in multi-class learning problems. Experimental results on five remote sensing data show that the combined approach is a promising method
Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods
This study uses stacked generalization, which is a two-step process of
combining machine learning methods, called meta or super learners, for
improving the performance of algorithms in step one (by minimizing the error
rate of each individual algorithm to reduce its bias in the learning set) and
then in step two inputting the results into the meta learner with its stacked
blended output (demonstrating improved performance with the weakest algorithms
learning better). The method is essentially an enhanced cross-validation
strategy. Although the process uses great computational resources, the
resulting performance metrics on resampled fraud data show that increased
system cost can be justified. A fundamental key to fraud data is that it is
inherently not systematic and, as of yet, the optimal resampling methodology
has not been identified. Building a test harness that accounts for all
permutations of algorithm sample set pairs demonstrates that the complex,
intrinsic data structures are all thoroughly tested. Using a comparative
analysis on fraud data that applies stacked generalizations provides useful
insight needed to find the optimal mathematical formula to be used for
imbalanced fraud data sets.Comment: 19 pages, 3 figures, 8 table
- ā¦