Search CORE

6,452 research outputs found

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Author: Pandey Gaurav
Whalen Sean
Publication venue
Publication date: 19/09/2013
Field of study

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

arXiv.org e-Print Archive

Crossref

Marginal and simultaneous predictive classification using stratified graphical models

Author: Corander Jukka
Nyman Henrik
Pensar Johan
Xiong Jie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/01/2014
Field of study

An inductive probabilistic classification rule must generally obey the principles of Bayesian predictive inference, such that all observed and unobserved stochastic quantities are jointly modeled and the parameter uncertainty is fully acknowledged through the posterior predictive distribution. Several such rules have been recently considered and their asymptotic behavior has been characterized under the assumption that the observed features or variables used for building a classifier are conditionally independent given a simultaneous labeling of both the training samples and those from an unknown origin. Here we extend the theoretical results to predictive classifiers acknowledging feature dependencies either through graphical models or sparser alternatives defined as stratified graphical models. We also show through experimentation with both synthetic and real data that the predictive classifiers based on stratified graphical models have consistently best accuracy compared with the predictive classifiers based on either conditionally independent features or on ordinary graphical models.Comment: 18 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Prediction of progression in idiopathic pulmonary fibrosis using CT scans atbaseline: A quantum particle swarm optimization - Random forest approach

Author: Brown Matthew S.
Goldin Jonathan G.
Kim Grace Hyun J.
Shi Yu
Wong Weng Kee
Publication venue: eScholarship, University of California
Publication date: 19/08/2019
Field of study

Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease characterized by an unpredictable progressive declinein lung function. Natural history of IPF is unknown and the prediction of disease progression at the time ofdiagnosis is notoriously difficult. High resolution computed tomography (HRCT) has been used for the diagnosisof IPF, but not generally for monitoring purpose. The objective of this work is to develop a novel predictivemodel for the radiological progression pattern at voxel-wise level using only baseline HRCT scans. Mainly, thereare two challenges: (a) obtaining a data set of features for region of interest (ROI) on baseline HRCT scans andtheir follow-up status; and (b) simultaneously selecting important features from high-dimensional space, andoptimizing the prediction performance. We resolved the first challenge by implementing a study design andhaving an expert radiologist contour ROIs at baseline scans, depending on its progression status in follow-upvisits. For the second challenge, we integrated the feature selection with prediction by developing an algorithmusing a wrapper method that combines quantum particle swarm optimization to select a small number of featureswith random forest to classify early patterns of progression. We applied our proposed algorithm to analyzeanonymized HRCT images from 50 IPF subjects from a multi-center clinical trial. We showed that it yields aparsimonious model with 81.8% sensitivity, 82.2% specificity and an overall accuracy rate of 82.1% at the ROIlevel. These results are superior to other popular feature selections and classification methods, in that ourmethod produces higher accuracy in prediction of progression and more balanced sensitivity and specificity witha smaller number of selected features. Our work is the first approach to show that it is possible to use onlybaseline HRCT scans to predict progressive ROIs at 6 months to 1year follow-ups using artificial intelligence

eScholarship - University of California

Optimal sensor placement for classifier-based leak localization in drinking water networks

Author: Blesa Izquierdo Joaquim
Fernández Canti Rosa M.
Puig Cayuela Vicenç
Soldevila Coma Adrià
Tornil Sin Sebastián
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a sensor placement method for classifier-based leak localization in Water Distribution Networks. The proposed approach consists in applying a Genetic Algorithm to decide the sensors to be used by a classifier (based on the k-Nearest Neighbor approach). The sensors are placed in an optimal way maximizing the accuracy of the leak localization. The results are illustrated by means of the application to the Hanoi District Metered Area and they are compared to the ones obtained by the Exhaustive Search Algorithm. A comparison with the results of a previous optimal sensor placement method is provided as well.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC