6,452 research outputs found
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Marginal and simultaneous predictive classification using stratified graphical models
An inductive probabilistic classification rule must generally obey the
principles of Bayesian predictive inference, such that all observed and
unobserved stochastic quantities are jointly modeled and the parameter
uncertainty is fully acknowledged through the posterior predictive
distribution. Several such rules have been recently considered and their
asymptotic behavior has been characterized under the assumption that the
observed features or variables used for building a classifier are conditionally
independent given a simultaneous labeling of both the training samples and
those from an unknown origin. Here we extend the theoretical results to
predictive classifiers acknowledging feature dependencies either through
graphical models or sparser alternatives defined as stratified graphical
models. We also show through experimentation with both synthetic and real data
that the predictive classifiers based on stratified graphical models have
consistently best accuracy compared with the predictive classifiers based on
either conditionally independent features or on ordinary graphical models.Comment: 18 pages, 5 figure
Recommended from our members
Prediction of progression in idiopathic pulmonary fibrosis using CT scans atbaseline: A quantum particle swarm optimization - Random forest approach
Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease characterized by an unpredictable progressive declinein lung function. Natural history of IPF is unknown and the prediction of disease progression at the time ofdiagnosis is notoriously difficult. High resolution computed tomography (HRCT) has been used for the diagnosisof IPF, but not generally for monitoring purpose. The objective of this work is to develop a novel predictivemodel for the radiological progression pattern at voxel-wise level using only baseline HRCT scans. Mainly, thereare two challenges: (a) obtaining a data set of features for region of interest (ROI) on baseline HRCT scans andtheir follow-up status; and (b) simultaneously selecting important features from high-dimensional space, andoptimizing the prediction performance. We resolved the first challenge by implementing a study design andhaving an expert radiologist contour ROIs at baseline scans, depending on its progression status in follow-upvisits. For the second challenge, we integrated the feature selection with prediction by developing an algorithmusing a wrapper method that combines quantum particle swarm optimization to select a small number of featureswith random forest to classify early patterns of progression. We applied our proposed algorithm to analyzeanonymized HRCT images from 50 IPF subjects from a multi-center clinical trial. We showed that it yields aparsimonious model with 81.8% sensitivity, 82.2% specificity and an overall accuracy rate of 82.1% at the ROIlevel. These results are superior to other popular feature selections and classification methods, in that ourmethod produces higher accuracy in prediction of progression and more balanced sensitivity and specificity witha smaller number of selected features. Our work is the first approach to show that it is possible to use onlybaseline HRCT scans to predict progressive ROIs at 6 months to 1year follow-ups using artificial intelligence
Optimal sensor placement for classifier-based leak localization in drinking water networks
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a sensor placement method for classifier-based leak localization in Water Distribution Networks. The proposed approach consists in applying a Genetic Algorithm to decide the sensors to be used by a classifier (based on the k-Nearest Neighbor approach). The sensors are placed in an optimal way maximizing the accuracy of the leak localization. The results are illustrated by means of the application to the Hanoi District Metered Area and they are compared to the ones obtained by the Exhaustive Search Algorithm. A comparison with the results of a previous optimal sensor placement method is provided as well.Postprint (author's final draft
- …