1,936 research outputs found
Predicting gene function using hierarchical multi-label decision tree ensembles
<p>Abstract</p> <p>Background</p> <p><it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.</p> <p>Results</p> <p>We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.</p> <p>Conclusions</p> <p>Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
Ensemble deep learning: A review
Ensemble learning combines several individual models to obtain better
generalization performance. Currently, deep learning models with multilayer
processing architecture is showing better performance as compared to the
shallow or traditional classification models. Deep ensemble learning models
combine the advantages of both the deep learning models as well as the ensemble
learning such that the final model has better generalization performance. This
paper reviews the state-of-art deep ensemble models and hence serves as an
extensive summary for the researchers. The ensemble models are broadly
categorised into ensemble models like bagging, boosting and stacking, negative
correlation based deep ensemble models, explicit/implicit ensembles,
homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised,
semi-supervised, reinforcement learning and online/incremental, multilabel
based deep ensemble models. Application of deep ensemble models in different
domains is also briefly discussed. Finally, we conclude this paper with some
future recommendations and research directions
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
Background The prediction of human gene–abnormal phenotype associations is a
fundamental step toward the discovery of novel genes associated with human
disorders, especially when no genes are known to be associated with a specific
disease. In this context the Human Phenotype Ontology (HPO) provides a
standard categorization of the abnormalities associated with human diseases.
While the problem of the prediction of gene–disease associations has been
widely investigated, the related problem of gene–phenotypic feature (i.e., HPO
term) associations has been largely overlooked, even if for most human genes
no HPO term associations are known and despite the increasing application of
the HPO to relevant medical problems. Moreover most of the methods proposed in
literature are not able to capture the hierarchical relationships between HPO
terms, thus resulting in inconsistent and relatively inaccurate predictions.
Results We present two hierarchical ensemble methods that we formally prove to
provide biologically consistent predictions according to the hierarchical
structure of the HPO. The modular structure of the proposed methods, that
consists in a “flat” learning first step and a hierarchical combination of the
predictions in the second step, allows the predictions of virtually any flat
learning method to be enhanced. The experimental results show that
hierarchical ensemble methods are able to predict novel associations between
genes and abnormal phenotypes with results that are competitive with state-of-
the-art algorithms and with a significant reduction of the computational
complexity. Conclusions Hierarchical ensembles are efficient computational
methods that guarantee biologically meaningful predictions that obey the true
path rule, and can be used as a tool to improve and make consistent the HPO
terms predictions starting from virtually any flat learning method. The
implementation of the proposed methods is available as an R package from the
CRAN repository
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions
Coherent Hierarchical Multi-Label Classification Networks
Hierarchical multi-label classification (HMC) is a challenging classification
task extending standard multi-label classification problems by imposing a
hierarchy constraint on the classes. In this paper, we propose C-HMCNN(h), a
novel approach for HMC problems, which, given a network h for the underlying
multi-label classification problem, exploits the hierarchy information in order
to produce predictions coherent with the constraint and improve performance. We
conduct an extensive experimental analysis showing the superior performance of
C-HMCNN(h) when compared to state-of-the-art models.Comment: Neural Information Processing Systems 202
Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification
Semi-supervised learning (SSL) is a common approach to learning predictive
models using not only labeled examples, but also unlabeled examples. While SSL
for the simple tasks of classification and regression has received a lot of
attention from the research community, this is not properly investigated for
complex prediction tasks with structurally dependent variables. This is the
case of multi-label classification and hierarchical multi-label classification
tasks, which may require additional information, possibly coming from the
underlying distribution in the descriptive space provided by unlabeled
examples, to better face the challenging task of predicting simultaneously
multiple class labels.
In this paper, we investigate this aspect and propose a (hierarchical)
multi-label classification method based on semi-supervised learning of
predictive clustering trees. We also extend the method towards ensemble
learning and propose a method based on the random forest approach. Extensive
experimental evaluation conducted on 23 datasets shows significant advantages
of the proposed method and its extension with respect to their supervised
counterparts. Moreover, the method preserves interpretability and reduces the
time complexity of classical tree-based models
An Extensive Empirical Comparison of Probabilistic Hierarchical Classifiers in Datasets of Ageing-Related Genes
This study comprehensively evaluates the performance of 5 types of probabilistic hierarchical classification methods used for predicting Gene Ontology (GO) terms related to ageing. Of those tested, a new hybrid of a Local Hierarchical Classifier (LHC) and the Predictive Clustering Tree algorithm (LHC-PCT) had the best predictive accuracy results. We also tested the impact of two types of variations in most hierarchical classification algorithms, namely: (a) changing the base algorithm (we tested Naive Bayes and Support Vector Machines), and the impact of (b) using or not the Correlation based Feature Selection (CFS) algorithm in a pre-processing step. In total, we evaluated the predictive performance of 17 variations of hierarchical classifiers across 15 datasets of ageing and longevityrelated genes. We conclude that the LHC-PCT algorithm ranks better across several tests (7 out of 12). In addition, we interpreted the models generated by the PCT algorithm to show how hierarchical classification algorithms can be used to extract biological insights out of the ageing-related datasets that we compiled
- …