24,949 research outputs found
EM for phylogenetic topology reconstruction on non-homogeneous data
Background: The reconstruction of the phylogenetic tree topology of four taxa
is, still nowadays, one of the main challenges in phylogenetics. Its
difficulties lie in considering not too restrictive evolutionary models, and
correctly dealing with the long-branch attraction problem. The correct
reconstruction of 4-taxon trees is crucial for making quartet-based methods
work and being able to recover large phylogenies.
Results: In this paper we consider an expectation-maximization method for
maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models
on trees. We study its success on reconstructing 4-taxon topologies and its
performance as input method in quartet-based phylogenetic reconstruction
methods such as QFIT and QuartetSuite. Our results show that the method
proposed here outperforms neighbor-joining and the usual (time-homogeneous
continuous-time) maximum likelihood methods on 4-leaved trees with
among-lineage instantaneous rate heterogeneity, and perform similarly to usual
continuous-time maximum-likelihood when data satisfies the assumptions of both
methods.
Conclusions: The method presented in this paper is well suited for
reconstructing the topology of any number of taxa via quartet-based methods and
is highly accurate, specially regarding largely divergent trees and time
nonhomogeneous data.Comment: 1 main file: 6 Figures and 2 Tables. 1 Additional file with 2 Figures
and 2 Tables. To appear in "BCM Evolutionary Biology
Markov Network Structure Learning via Ensemble-of-Forests Models
Real world systems typically feature a variety of different dependency types
and topologies that complicate model selection for probabilistic graphical
models. We introduce the ensemble-of-forests model, a generalization of the
ensemble-of-trees model. Our model enables structure learning of Markov random
fields (MRF) with multiple connected components and arbitrary potentials. We
present two approximate inference techniques for this model and demonstrate
their performance on synthetic data. Our results suggest that the
ensemble-of-forests approach can accurately recover sparse, possibly
disconnected MRF topologies, even in presence of non-Gaussian dependencies
and/or low sample size. We applied the ensemble-of-forests model to learn the
structure of perturbed signaling networks of immune cells and found that these
frequently exhibit non-Gaussian dependencies with disconnected MRF topologies.
In summary, we expect that the ensemble-of-forests model will enable MRF
structure learning in other high dimensional real world settings that are
governed by non-trivial dependencies.Comment: 13 pages, 6 figure
Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models
The emergence and development of cancer is a consequence of the accumulation
over time of genomic mutations involving a specific set of genes, which
provides the cancer clones with a functional selective advantage. In this work,
we model the order of accumulation of such mutations during the progression,
which eventually leads to the disease, by means of probabilistic graphic
models, i.e., Bayesian Networks (BNs). We investigate how to perform the task
of learning the structure of such BNs, according to experimental evidence,
adopting a global optimization meta-heuristics. In particular, in this work we
rely on Genetic Algorithms, and to strongly reduce the execution time of the
inference -- which can also involve multiple repetitions to collect
statistically significant assessments of the data -- we distribute the
calculations using both multi-threading and a multi-node architecture. The
results show that our approach is characterized by good accuracy and
specificity; we also demonstrate its feasibility, thanks to a 84x reduction of
the overall execution time with respect to a traditional sequential
implementation
Pair-copula constructions of multiple dependence
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically
- …