12,538 research outputs found
Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction
The genome-wide hierarchical classification of gene functions, using biomolecular data from high-throughput biotechnologies, is one of the central topics in bioinformatics and functional genomics. In this paper we present a multilabel hierarchical algorithm inspired by the \u201ctrue path rule\u201d that governs both the Gene Ontology and the Functional Catalogue (FunCat). In particular we propose an enhanced version of
the True Path Rule (TPR) algorithm, by which we can control the flow of information between the classifiers of the hierarchical ensemble, thus allowing to tune the precision/recall characteristics of the overall hierarchical classification system. Results with the model organism S. cerevisiae show that the proposed method significantly improves on the basic version of the TPR algorithm, as well as on the Hierarchical Top-down and Flat ensembles
Resolving structural variability in network models and the brain
Large-scale white matter pathways crisscrossing the cortex create a complex
pattern of connectivity that underlies human cognitive function. Generative
mechanisms for this architecture have been difficult to identify in part
because little is known about mechanistic drivers of structured networks. Here
we contrast network properties derived from diffusion spectrum imaging data of
the human brain with 13 synthetic network models chosen to probe the roles of
physical network embedding and temporal network growth. We characterize both
the empirical and synthetic networks using familiar diagnostics presented in
statistical form, as scatter plots and distributions, to reveal the full range
of variability of each measure across scales in the network. We focus on the
degree distribution, degree assortativity, hierarchy, topological Rentian
scaling, and topological fractal scaling---in addition to several summary
statistics, including the mean clustering coefficient, shortest path length,
and network diameter. The models are investigated in a progressive, branching
sequence, aimed at capturing different elements thought to be important in the
brain, and range from simple random and regular networks, to models that
incorporate specific growth rules and constraints. We find that synthetic
models that constrain the network nodes to be embedded in anatomical brain
regions tend to produce distributions that are similar to those extracted from
the brain. We also find that network models hardcoded to display one network
property do not in general also display a second, suggesting that multiple
neurobiological mechanisms might be at play in the development of human brain
network architecture. Together, the network models that we develop and employ
provide a potentially useful starting point for the statistical inference of
brain network structure from neuroimaging data.Comment: 24 pages, 11 figures, 1 table, supplementary material
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
Background The prediction of human gene–abnormal phenotype associations is a
fundamental step toward the discovery of novel genes associated with human
disorders, especially when no genes are known to be associated with a specific
disease. In this context the Human Phenotype Ontology (HPO) provides a
standard categorization of the abnormalities associated with human diseases.
While the problem of the prediction of gene–disease associations has been
widely investigated, the related problem of gene–phenotypic feature (i.e., HPO
term) associations has been largely overlooked, even if for most human genes
no HPO term associations are known and despite the increasing application of
the HPO to relevant medical problems. Moreover most of the methods proposed in
literature are not able to capture the hierarchical relationships between HPO
terms, thus resulting in inconsistent and relatively inaccurate predictions.
Results We present two hierarchical ensemble methods that we formally prove to
provide biologically consistent predictions according to the hierarchical
structure of the HPO. The modular structure of the proposed methods, that
consists in a “flat” learning first step and a hierarchical combination of the
predictions in the second step, allows the predictions of virtually any flat
learning method to be enhanced. The experimental results show that
hierarchical ensemble methods are able to predict novel associations between
genes and abnormal phenotypes with results that are competitive with state-of-
the-art algorithms and with a significant reduction of the computational
complexity. Conclusions Hierarchical ensembles are efficient computational
methods that guarantee biologically meaningful predictions that obey the true
path rule, and can be used as a tool to improve and make consistent the HPO
terms predictions starting from virtually any flat learning method. The
implementation of the proposed methods is available as an R package from the
CRAN repository
Comparison and validation of community structures in complex networks
The issue of partitioning a network into communities has attracted a great
deal of attention recently. Most authors seem to equate this issue with the one
of finding the maximum value of the modularity, as defined by Newman. Since the
problem formulated this way is NP-hard, most effort has gone into the
construction of search algorithms, and less to the question of other measures
of community structures, similarities between various partitionings and the
validation with respect to external information. Here we concentrate on a class
of computer generated networks and on three well-studied real networks which
constitute a bench-mark for network studies; the karate club, the US college
football teams and a gene network of yeast. We utilize some standard ways of
clustering data (originally not designed for finding community structures in
networks) and show that these classical methods sometimes outperform the newer
ones. We discuss various measures of the strength of the modular structure, and
show by examples features and drawbacks. Further, we compare different
partitions by applying some graph-theoretic concepts of distance, which
indicate that one of the quality measures of the degree of modularity
corresponds quite well with the distance from the true partition. Finally, we
introduce a way to validate the partitionings with respect to external data
when the nodes are classified but the network structure is unknown. This is
here possible since we know everything of the computer generated networks, as
well as the historical answer to how the karate club and the football teams are
partitioned in reality. The partitioning of the gene network is validated by
use of the Gene Ontology database, where we show that a community in general
corresponds to a biological process.Comment: To appear in Physica A; 25 page
- …