Search CORE

5,979 research outputs found

Adaptive Ranking Based Constraint Handling for Explicitly Constrained Black-Box Optimization

Author: Akimoto Youhei
Sakamoto Naoki
Publication venue
Publication date: 13/06/2019
Field of study

A novel explicit constraint handling technique for the covariance matrix adaptation evolution strategy (CMA-ES) is proposed. The proposed constraint handling exhibits two invariance properties. One is the invariance to arbitrary element-wise increasing transformation of the objective and constraint functions. The other is the invariance to arbitrary affine transformation of the search space. The proposed technique virtually transforms a constrained optimization problem into an unconstrained optimization problem by considering an adaptive weighted sum of the ranking of the objective function values and the ranking of the constraint violations that are measured by the Mahalanobis distance between each candidate solution to its projection onto the boundary of the constraints. Simulation results are presented and show that the CMA-ES with the proposed constraint handling exhibits the affine invariance and performs similarly to the CMA-ES on unconstrained counterparts.Comment: 9 page

arXiv.org e-Print Archive

Empowering One-vs-One Decomposition with Ensemble Learning for Multi-Class Imbalanced Data

Author: García López Salvador
Herrera Triguero Francisco
Krawczyk Bartosz
Rosales-Pérez Alejandro
Zhang Zhongliang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

Zhongliang Zhang was supported by the National Science Foundation of China (NSFC Proj. 61273204) and CSC Scholarship Program (CSC NO. 201406080059). Bartosz Krawczyk was supported by the Polish National Science Center under the grant no. UMO-2015/19/B/ST6/01597. Salvador Garcia and Francisco Herrera were partially supported by the Spanish Ministry of Education and Science under Project TIN2014-57251-P and the Andalusian Research Plan P10-TIC-6858, P11-TIC-7765. Alejandro Rosales-Perez was supported by the CONACyT grant 329013.Multi-class imbalance classification problems occur in many real-world applications, which suffer from the quite different distribution of classes. Decomposition strategies are well-known techniques to address the classification problems involving multiple classes. Among them binary approaches using one-vs-one and one-vs-all has gained a significant attention from the research community. They allow to divide multi-class problems into several easier-to-solve two-class sub-problems. In this study we develop an exhaustive empirical analysis to explore the possibility of empowering the one-vs-one scheme for multi-class imbalance classification problems with applying binary ensemble learning approaches. We examine several state-of-the-art ensemble learning methods proposed for addressing the imbalance problems to solve the pairwise tasks derived from the multi-class data set. Then the aggregation strategy is employed to combine the binary ensemble outputs to reconstruct the original multi-class task. We present a detailed experimental study of the proposed approach, supported by the statistical analysis. The results indicate the high effectiveness of ensemble learning with one-vs-one scheme in dealing with the multi-class imbalance classification problems.National Natural Science Foundation of China (NSFC) 61273204CSC Scholarship Program (CSC) 201406080059Polish National Science Center UMO-2015/19/B/ST6/01597Spanish Government TIN2014-57251-PAndalusian Research Plan P10-TIC-6858 P11-TIC-7765Consejo Nacional de Ciencia y Tecnologia (CONACyT) 32901

Repositorio Institucional Universidad de Granada

Class imbalance ensemble learning based on the margin theory

Author: Feng Wei
Huang Wenjiang
Ren Jinchang
Publication venue: 'MDPI AG'
Publication date: 18/05/2018
Field of study

The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging, our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning

Multidisciplinary Digital Publishing Institute

University of Strathclyde Institutional Repository

An Oversampling Mechanism for Multimajority Datasets using SMOTE and Darwinian Particle Swarm Optimisation

Author: Gunasundari R.
Mary Mathew Rose
Publication venue: Auricle Global Society of Education and Research
Publication date: 10/03/2023
Field of study

Data skewness continues to be one of the leading factors which adversely impacts the machine learning algorithms performance. An approach to reduce this negative effect of the data variance is to pre-process the former dataset with data level resampling strategies. Resampling strategies have been seen in two forms, oversampling and undersampling. An oversampling strategy is proposed in this article for tackling multiclass imbalanced datasets. This proposed approach optimises the state-of-the-art oversampling technique SMOTE with the Darwinian Particle Swarm Optimization technique. This proposed method DOSMOTE generates synthetic optimised samples for balancing the datasets. This strategy will be more effective on multimajority datasets.  An experimental study is performed on peculiar multimajority datasets to measure the effectiveness of the proposed approach. As a result, the proposed method produces promising results when compared to the conventional oversampling strategies

International Journal on Recent and Innovation Trends in Computing and Communication

Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

Author: Boulesteix Anne-Laure
Janitza Silke
Kruppa Jochen
König Inke R.
Publication venue
Publication date: 25/07/2012
Field of study

The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

Open Access LMU

A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios

Author: Alejo Eleuterio Roberto
García Jiménez Vicente
Pacheco Sánchez J. H.
Valdovinos Rosas Rosa María
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Class imbalance and class overlap are two of the major problems in data mining and machine learning. Several studies have shown that these data complexities may affect the performance or behavior of artificial neural networks. Strategies proposed to face with both challenges have been separately applied. In this paper, we introduce a hybrid method for handling both class imbalance and class overlap simultaneously in multi-class learning problems. Experimental results on five remote sensing data show that the combined approach is a promising method

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

Author: Bastian Nathaniel D.
Kerwin Kathleen
Publication venue
Publication date: 03/04/2020
Field of study

This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms in step one (by minimizing the error rate of each individual algorithm to reduce its bias in the learning set) and then in step two inputting the results into the meta learner with its stacked blended output (demonstrating improved performance with the weakest algorithms learning better). The method is essentially an enhanced cross-validation strategy. Although the process uses great computational resources, the resulting performance metrics on resampled fraud data show that increased system cost can be justified. A fundamental key to fraud data is that it is inherently not systematic and, as of yet, the optimal resampling methodology has not been identified. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested. Using a comparative analysis on fraud data that applies stacked generalizations provides useful insight needed to find the optimal mathematical formula to be used for imbalanced fraud data sets.Comment: 19 pages, 3 figures, 8 table

arXiv.org e-Print Archive

USMA Digital Commons (United States Military Academy, West Point)