12,610 research outputs found
Classifying pairs with trees for supervised biological network inference
Networks are ubiquitous in biology and computational approaches have been
largely investigated for their inference. In particular, supervised machine
learning methods can be used to complete a partially known network by
integrating various measurements. Two main supervised frameworks have been
proposed: the local approach, which trains a separate model for each network
node, and the global approach, which trains a single model over pairs of nodes.
Here, we systematically investigate, theoretically and empirically, the
exploitation of tree-based ensemble methods in the context of these two
approaches for biological network inference. We first formalize the problem of
network inference as classification of pairs, unifying in the process
homogeneous and bipartite graphs and discussing two main sampling schemes. We
then present the global and the local approaches, extending the later for the
prediction of interactions between two unseen network nodes, and discuss their
specializations to tree-based ensemble methods, highlighting their
interpretability and drawing links with clustering techniques. Extensive
computational experiments are carried out with these methods on various
biological networks that clearly highlight that these methods are competitive
with existing methods.Comment: 22 page
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
- …