4,986 research outputs found

    Ensemble Learning for Free with Evolutionary Algorithms ?

    Get PDF
    Evolutionary Learning proceeds by evolving a population of classifiers, from which it generally returns (with some notable exceptions) the single best-of-run classifier as final result. In the meanwhile, Ensemble Learning, one of the most efficient approaches in supervised Machine Learning for the last decade, proceeds by building a population of diverse classifiers. Ensemble Learning with Evolutionary Computation thus receives increasing attention. The Evolutionary Ensemble Learning (EEL) approach presented in this paper features two contributions. First, a new fitness function, inspired by co-evolution and enforcing the classifier diversity, is presented. Further, a new selection criterion based on the classification margin is proposed. This criterion is used to extract the classifier ensemble from the final population only (Off-line) or incrementally along evolution (On-line). Experiments on a set of benchmark problems show that Off-line outperforms single-hypothesis evolutionary learning and state-of-art Boosting and generates smaller classifier ensembles

    A Diversity-Accuracy Measure for Homogenous Ensemble Selection

    Get PDF
    Several selection methods in the literature are essentially based on an evaluation function that determines whether a model M contributes positively to boost the performances of the whole ensemble. In this paper, we propose a method called DIversity and ACcuracy for Ensemble Selection (DIACES) using an evaluation function based on both diversity and accuracy. The method is applied on homogenous ensembles composed of C4.5 decision trees and based on a hill climbing strategy. This allows selecting ensembles with the best compromise between maximum diversity and minimum error rate. Comparative studies show that in most cases the proposed method generates reduced size ensembles with better performances than usual ensemble simplification methods

    Tree Genera Classification by Ensemble Classification of Small-Footprint Airborne LiDAR

    Get PDF
    Tree genera information is useful in environmental applications such as forest management, forestry, urban planning, and the maintenance of utility transmission line infrastructure. The ability of small foot print airborne LiDAR (Light Detection and Ranging) to acquire 3D information provides a promising way of studying vertical forest structures. This provides an extra dimension of information compared to the traditional 2D remote sensing data. However, the techniques for processing this type of data are relatively recent and have becoming an innovative research direction. The existing perspective for processing LiDAR data for tree species classification involve calculating the statistics attributes of the vertical point profile for individual trees. This method however does not explicitly utilize the geometric information of the tree form such as shapes of the tree crown and geometric features that are derivable inside of the tree crown. Therefore, the aim of this dissertation research is to derive geometric features from individual tree crowns and use these features for genera classification. The second goal of this research is to improve classification results by combining the newly developed features with the conventional vertical point profile features through ensemble classification system. Final goal of this research is to design a classification system to cope with the situation where the number of classes in the validation data exceeds the number of classes in the training data. 24 geometric features were initially derived and six of them are selected for the classification of pine, poplar and maple. Average classification accuracy of 88.3% is achieved by using this method. When the geometric features are combined with vertical profile features by ensemble classification system, the average classification accuracy increased to 91.2%. While the individual performance of geometric classifier and vertical classifier is 88.0% and 88.8% respectively for the classification of pine, poplar and maple. Lastly, when samples that do not belong to pine, poplar and maple are added to the validation data, the classification accuracy dropped to 72.8% by using randomly selected samples for training. However, through diversified sampling technique, the classification accuracy increased to 93.8%

    A weighted multiple classifier framework based on random projection.

    Get PDF
    In this paper, we propose a weighted multiple classifier framework based on random projections. Similar to the mechanism of other homogeneous ensemble methods, the base classifiers in our approach are obtained by a learning algorithm on different training sets generated by projecting the original up-space training set to lower dimensional down-spaces. We then apply a Least SquarE−based method to weigh the outputs of the base classifiers so that the contribution of each classifier to the final combined prediction is different. We choose Decision Tree as the learning algorithm in the proposed framework and conduct experiments on a number of real and synthetic datasets. The experimental results indicate that our framework is better than many of the benchmark algorithms, including three homogeneous ensemble methods (Bagging, RotBoost, and Random Subspace), several well-known algorithms (Decision Tree, Random Neural Network, Linear Discriminative Analysis, K Nearest Neighbor, L2-loss Linear Support Vector Machine, and Discriminative Restricted Boltzmann Machine), and random projection-based ensembles with fixed combining rules with regard to both classification error rates and F1 scores

    A literature survey of active machine learning in the context of natural language processing

    Get PDF
    Active learning is a supervised machine learning technique in which the learner is in control of the data used for learning. That control is utilized by the learner to ask an oracle, typically a human with extensive knowledge of the domain at hand, about the classes of the instances for which the model learned so far makes unreliable predictions. The active learning process takes as input a set of labeled examples, as well as a larger set of unlabeled examples, and produces a classifier and a relatively small set of newly labeled data. The overall goal is to create as good a classifier as possible, without having to mark-up and supply the learner with more data than necessary. The learning process aims at keeping the human annotation effort to a minimum, only asking for advice where the training utility of the result of such a query is high. Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation. This report is a literature survey of active learning from the perspective of natural language processing

    Investigating Randomised Sphere Covers in Supervised Learning

    Get PDF
    c©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (αRSC) and use several datasets in order to evaluate the classification performance of the αRSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of αRSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for αRSC. As such, we used a Gaussian kernel to investigate further data compression

    Ensemble diversity for class imbalance learning

    Get PDF
    This thesis studies the diversity issue of classification ensembles for class imbalance learning problems. Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority). The very skewed class distribution degrades the learning ability of many traditional machine learning methods, especially in the recognition of examples from the minority classes, which are often deemed to be more important and interesting. Although quite a few ensemble learning approaches have been proposed to handle the problem, no in-depth research exists to explain why and when they can be helpful. Our objectives are to understand how ensemble diversity affects the classification performance for a class imbalance problem according to single-class and overall performance measures, and to make best use of diversity to improve the performance. As the first stage, we study the relationship between ensemble diversity and generalization performance for class imbalance problems. We investigate mathematical links between single-class performance and ensemble diversity. It is found that how the single-class measures change along with diversity falls into six different situations. These findings are then verified in class imbalance scenarios through empirical studies. The impact of diversity on overall performance is also investigated empirically. Strong correlations between diversity and the performance measures are found. Diversity shows a positive impact on the recognition of the minority class and benefits the overall performance of ensembles in class imbalance learning. Our results help to understand if and why ensemble diversity can help to deal with class imbalance problems. Encouraged by the positive role of diversity in class imbalance learning, we then focus on a specific ensemble learning technique, the negative correlation learning (NCL) algorithm, which considers diversity explicitly when creating ensembles and has achieved great empirical success. We propose a new learning algorithm based on the idea of NCL, named AdaBoost.NC, for classification problems. An ``ambiguity" term decomposed from the 0-1 error function is introduced into the training framework of AdaBoost. It demonstrates superiority in both effectiveness and efficiency. Its good generalization performance is explained by theoretical and empirical evidences. It can be viewed as the first NCL algorithm specializing in classification problems. Most existing ensemble methods for class imbalance problems suffer from the problems of overfitting and over-generalization. To improve this situation, we address the class imbalance issue by making use of ensemble diversity. We investigate the generalization ability of NCL algorithms, including AdaBoost.NC, to tackle two-class imbalance problems. We find that NCL methods integrated with random oversampling are effective in recognizing minority class examples without losing the overall performance, especially the AdaBoost.NC tree ensemble. This is achieved by providing smoother and less overfitting classification boundaries for the minority class. The results here show the usefulness of diversity and open up a novel way to deal with class imbalance problems. Since the two-class imbalance is not the only scenario in real-world applications, multi-class imbalance problems deserve equal attention. To understand what problems multi-class can cause and how it affects the classification performance, we study the multi-class difficulty by analyzing the multi-minority and multi-majority cases respectively. Both lead to a significant performance reduction. The multi-majority case appears to be more harmful. The results reveal possible issues that a class imbalance learning technique could have when dealing with multi-class tasks. Following this part of analysis and the promising results of AdaBoost.NC on two-class imbalance problems, we apply AdaBoost.NC to a set of multi-class imbalance domains with the aim of solving them effectively and directly. Our method shows good generalization in minority classes and balances the performance across different classes well without using any class decomposition schemes. Finally, we conclude this thesis with how the study has contributed to class imbalance learning and ensemble learning, and propose several possible directions for future research that may improve and extend this work
    • …