191 research outputs found

    Bagging ensemble selection for regression

    Get PDF
    Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on binary classification problems have shown that using random trees as base classifiers, BES-OOB (the most successful variant of BES) is competitive with (and in many cases, superior to) other ensemble learning strategies, for instance, the original ES algorithm, stacking with linear regression, random forests or boosting. Motivated by the promising results in classification, this paper examines the predictive performance of the BES-OOB strategy for regression problems. Our results show that the BES-OOB strategy outperforms Stochastic Gradient Boosting and Bagging when using regression trees as the base learners. Our results also suggest that the advantage of using a diverse model library becomes clear when the model library size is relatively large. We also present encouraging results indicating that the non negative least squares algorithm is a viable approach for pruning an ensemble of ensembles

    classification of oncologic data with genetic programming

    Get PDF
    Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification

    Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Get PDF
    Article discussing research on classifying genes to the correct gene ontology slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Decision trees and multi-level ensemble classifiers for neurological diagnostics

    Full text link
    Cardiac autonomic neuropathy (CAN) is a well known complication of diabetes leading to impaired regulation of blood pressure and heart rate, and increases the risk of cardiac associated mortality of diabetes patients. The neurological diagnostics of CAN progression is an important problem that is being actively investigated. This paper uses data collected as part of a large and unique Diabetes Screening Complications Research Initiative (DiScRi) in Australia with data from numerous tests related to diabetes to classify CAN progression. The present paper is devoted to recent experimental investigations of the effectiveness of applications of decision trees, ensemble classifiers and multi-level ensemble classifiers for neurological diagnostics of CAN. We present the results of experiments comparing the effectiveness of ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart decision tree classifiers. Our results show that SimpleCart was the most effective for the DiScRi data set in classifying CAN. We also investigated and compared the effectiveness of AdaBoost, Bagging, MultiBoost, Stacking, Decorate, Dagging, and Grading, based on Ripple Down Rules as examples of ensemble classifiers. Further, we investigated the effectiveness of these ensemble methods as a function of the base classifiers, and determined that Random Forest performed best as a base classifier, and AdaBoost, Bagging and Decorate achieved the best outcomes as meta-classifiers in this setting. Finally, we investigated the meta-classifiers that performed best in their ability to enhance the performance further within the framework of a multi-level classification paradigm. Experimental results show that the multi-level paradigm performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier

    Empirical investigation of decision tree ensembles for monitoring cardiac complications of diabetes

    Full text link
    Cardiac complications of diabetes require continuous monitoring since they may lead to increased morbidity or sudden death of patients. In order to monitor clinical complications of diabetes using wearable sensors, a small set of features have to be identified and effective algorithms for their processing need to be investigated. This article focuses on detecting and monitoring cardiac autonomic neuropathy (CAN) in diabetes patients. The authors investigate and compare the effectiveness of classifiers based on the following decision trees: ADTree, J48, NBTree, RandomTree, REPTree, and SimpleCart. The authors perform a thorough study comparing these decision trees as well as several decision tree ensembles created by applying the following ensemble methods: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost, Stacking, and two multi-level combinations of AdaBoost and MultiBoost with Bagging for the processing of data from diabetes patients for pervasive health monitoring of CAN. This paper concentrates on the particular task of applying decision tree ensembles for the detection and monitoring of cardiac autonomic neuropathy using these features. Experimental outcomes presented here show that the authors' application of the decision tree ensembles for the detection and monitoring of CAN in diabetes patients achieved better performance parameters compared with the results obtained previously in the literature

    Analyse locale de la forme 3D pour la reconnaissance d'expressions faciales

    Get PDF
    National audienceIn this paper we propose a novel approach for indentityindependent 3D facial expression recognition. Our approach is based on shape analysis of local patches extracted from 3D facial shape model. A Riemannian framework is applied to compute geodesic distances between correspondent patches belonging to different faces of the BU-3DFE database and conveying different expressions. Quantitative measures of similarity are obtained and then used as inputs to several classification methods. Using Multiboosting and Support Vector Machines (SVM) classifiers, we achieved average recognition rates respectively equal to 98.81% and 97.75%.Dans cet article, nous proposons une nouvelle approche pour la reconnaissance d'expressions faciales 3D invariante par rapport à l'identité. Cette approche est basée sur l'analyse de formes de " patches "locaux extraits à partir de modèles de visages 3D. Un cadre Riemannien est utilisé pour le calcul de distances géodésiques entre les patches correspondants appartenant a des visages différents sous différentes expressions. Des mesures quantitatives de similarité sont alors obtenues et sont utilisées comme des paramètres d'entrée pour des algorithmes de classification multiclasses. En utilisant des techniques de Multiboosting et de Machines à Vecteurs de Support (SVM), les taux de reconnaissance des six expressions de base obtenus sur la base BU-3DFE sont respectivement 98.81% et 97.75%