1,834 research outputs found

    Empowering One-vs-One Decomposition with Ensemble Learning for Multi-Class Imbalanced Data

    Get PDF
    Zhongliang Zhang was supported by the National Science Foundation of China (NSFC Proj. 61273204) and CSC Scholarship Program (CSC NO. 201406080059). Bartosz Krawczyk was supported by the Polish National Science Center under the grant no. UMO-2015/19/B/ST6/01597. Salvador Garcia and Francisco Herrera were partially supported by the Spanish Ministry of Education and Science under Project TIN2014-57251-P and the Andalusian Research Plan P10-TIC-6858, P11-TIC-7765. Alejandro Rosales-Perez was supported by the CONACyT grant 329013.Multi-class imbalance classification problems occur in many real-world applications, which suffer from the quite different distribution of classes. Decomposition strategies are well-known techniques to address the classification problems involving multiple classes. Among them binary approaches using one-vs-one and one-vs-all has gained a significant attention from the research community. They allow to divide multi-class problems into several easier-to-solve two-class sub-problems. In this study we develop an exhaustive empirical analysis to explore the possibility of empowering the one-vs-one scheme for multi-class imbalance classification problems with applying binary ensemble learning approaches. We examine several state-of-the-art ensemble learning methods proposed for addressing the imbalance problems to solve the pairwise tasks derived from the multi-class data set. Then the aggregation strategy is employed to combine the binary ensemble outputs to reconstruct the original multi-class task. We present a detailed experimental study of the proposed approach, supported by the statistical analysis. The results indicate the high effectiveness of ensemble learning with one-vs-one scheme in dealing with the multi-class imbalance classification problems.National Natural Science Foundation of China (NSFC) 61273204CSC Scholarship Program (CSC) 201406080059Polish National Science Center UMO-2015/19/B/ST6/01597Spanish Government TIN2014-57251-PAndalusian Research Plan P10-TIC-6858 P11-TIC-7765Consejo Nacional de Ciencia y Tecnologia (CONACyT) 32901

    Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification

    Get PDF
    We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset

    Enhancing multi-class classification in FARC-HD fuzzy classifier: on the synergy between n-dimensional overlap functions and decomposition strategies

    Get PDF
    There are many real-world classification problems involving multiple classes, e.g., in bioinformatics, computer vision or medicine. These problems are generally more difficult than their binary counterparts. In this scenario, decomposition strategies usually improve the performance of classifiers. Hence, in this paper we aim to improve the behaviour of FARC-HD fuzzy classifier in multi-class classification problems using decomposition strategies, and more specifically One-vs-One (OVO) and One-vs-All (OVA) strategies. However, when these strategies are applied on FARC-HD a problem emerges due to the low confidence values provided by the fuzzy reasoning method. This undesirable condition comes from the application of the product t-norm when computing the matching and association degrees, obtaining low values, which are also dependent on the number of antecedents of the fuzzy rules. As a result, robust aggregation strategies in OVO such as the weighted voting obtain poor results with this fuzzy classifier. In order to solve these problems, we propose to adapt the inference system of FARC-HD replacing the product t-norm with overlap functions. To do so, we define n-dimensional overlap functions. The usage of these new functions allows one to obtain more adequate outputs from the base classifiers for the subsequent aggregation in OVO and OVA schemes. Furthermore, we propose a new aggregation strategy for OVO to deal with the problem of the weighted voting derived from the inappropriate confidences provided by FARC-HD for this aggregation method. The quality of our new approach is analyzed using twenty datasets and the conclusions are supported by a proper statistical analysis. In order to check the usefulness of our proposal, we carry out a comparison against some of the state-of-the-art fuzzy classifiers. Experimental results show the competitiveness of our method.This work was supported in part by the Spanish Ministry of Science and Technology under projects TIN2011-28488, TIN-2012-33856 and TIN-2013- 40765-P and the Andalusian Research Plan P10-TIC-6858 and P11-TIC-7765

    Investigating Randomised Sphere Covers in Supervised Learning

    Get PDF
    c©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (αRSC) and use several datasets in order to evaluate the classification performance of the αRSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of αRSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for αRSC. As such, we used a Gaussian kernel to investigate further data compression

    Ensemble diversity for class imbalance learning

    Get PDF
    This thesis studies the diversity issue of classification ensembles for class imbalance learning problems. Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority). The very skewed class distribution degrades the learning ability of many traditional machine learning methods, especially in the recognition of examples from the minority classes, which are often deemed to be more important and interesting. Although quite a few ensemble learning approaches have been proposed to handle the problem, no in-depth research exists to explain why and when they can be helpful. Our objectives are to understand how ensemble diversity affects the classification performance for a class imbalance problem according to single-class and overall performance measures, and to make best use of diversity to improve the performance. As the first stage, we study the relationship between ensemble diversity and generalization performance for class imbalance problems. We investigate mathematical links between single-class performance and ensemble diversity. It is found that how the single-class measures change along with diversity falls into six different situations. These findings are then verified in class imbalance scenarios through empirical studies. The impact of diversity on overall performance is also investigated empirically. Strong correlations between diversity and the performance measures are found. Diversity shows a positive impact on the recognition of the minority class and benefits the overall performance of ensembles in class imbalance learning. Our results help to understand if and why ensemble diversity can help to deal with class imbalance problems. Encouraged by the positive role of diversity in class imbalance learning, we then focus on a specific ensemble learning technique, the negative correlation learning (NCL) algorithm, which considers diversity explicitly when creating ensembles and has achieved great empirical success. We propose a new learning algorithm based on the idea of NCL, named AdaBoost.NC, for classification problems. An ``ambiguity" term decomposed from the 0-1 error function is introduced into the training framework of AdaBoost. It demonstrates superiority in both effectiveness and efficiency. Its good generalization performance is explained by theoretical and empirical evidences. It can be viewed as the first NCL algorithm specializing in classification problems. Most existing ensemble methods for class imbalance problems suffer from the problems of overfitting and over-generalization. To improve this situation, we address the class imbalance issue by making use of ensemble diversity. We investigate the generalization ability of NCL algorithms, including AdaBoost.NC, to tackle two-class imbalance problems. We find that NCL methods integrated with random oversampling are effective in recognizing minority class examples without losing the overall performance, especially the AdaBoost.NC tree ensemble. This is achieved by providing smoother and less overfitting classification boundaries for the minority class. The results here show the usefulness of diversity and open up a novel way to deal with class imbalance problems. Since the two-class imbalance is not the only scenario in real-world applications, multi-class imbalance problems deserve equal attention. To understand what problems multi-class can cause and how it affects the classification performance, we study the multi-class difficulty by analyzing the multi-minority and multi-majority cases respectively. Both lead to a significant performance reduction. The multi-majority case appears to be more harmful. The results reveal possible issues that a class imbalance learning technique could have when dealing with multi-class tasks. Following this part of analysis and the promising results of AdaBoost.NC on two-class imbalance problems, we apply AdaBoost.NC to a set of multi-class imbalance domains with the aim of solving them effectively and directly. Our method shows good generalization in minority classes and balances the performance across different classes well without using any class decomposition schemes. Finally, we conclude this thesis with how the study has contributed to class imbalance learning and ensemble learning, and propose several possible directions for future research that may improve and extend this work

    Dynamic selection of the best base classifier in one versus one

    Get PDF
    Class binarization strategies decompose the original multi-class problem into several binary sub-problems. One versus One (OVO) is one of the most popular class binarization techniques, which considers every pair of classes as a different sub-problem. Usually, the same classifier is applied to every sub-problem and then all the outputs are combined by some voting scheme. In this paper we present a novel idea where for each test instance we try to assign the best classifier in each sub-problem of OVO. To do so, we have used two simple Dynamic Classifier Selection (DCS) strategies that have not been yet used in this context. The two DCS strategies use K-NN to obtain the local region of the test-instance, and the classifier that performs the best for those instances in the local region, is selected to classify the new test instance. The difference between the two DCS strategies remains in the weight of the instance. In this paper we have also proposed a novel approach in those DCS strategies. We propose to use the K-Nearest Neighbor Equality (K-NNE) method to obtain the local accuracy. K-NNE is an extension of K-NN in which all the classes are treated independently: the K nearest neighbors belonging to each class are selected. In this way all the classes take part in the final decision. We have carried out an empirical study over several UCI databases, which shows the robustness of our proposal.The work described in this paper was partially conducted within the Basque Government Research Team Grant IT313-10 and the University of the Basque Country UPV/EHU. I. Mendialdua holds a Grant from Basque Government
    • …
    corecore