8,465 research outputs found
Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees
We establish necessary and sufficient conditions for consistent root
reconstruction in continuous-time Markov models with countable state space on
bounded-height trees. Here a root state estimator is said to be consistent if
the probability that it returns to the true root state converges to 1 as the
number of leaves tends to infinity. We also derive quantitative bounds on the
error of reconstruction. Our results answer a question of Gascuel and Steel and
have implications for ancestral sequence reconstruction in a classical
evolutionary model of nucleotide insertion and deletion.Comment: 30 pages, 3 figures, title of reference [FR] is update
Inferring ancestral sequences in taxon-rich phylogenies
Statistical consistency in phylogenetics has traditionally referred to the
accuracy of estimating phylogenetic parameters for a fixed number of species as
we increase the number of characters. However, as sequences are often of fixed
length (e.g. for a gene) although we are often able to sample more taxa, it is
useful to consider a dual type of statistical consistency where we increase the
number of species, rather than characters. This raises some basic questions:
what can we learn about the evolutionary process as we increase the number of
species? In particular, does having more species allow us to infer the
ancestral state of characters accurately? This question is particularly
relevant when sequence site evolution varies in a complex way from character to
character, as well as for reconstructing ancestral sequences. In this paper, we
assemble a collection of results to analyse various approaches for inferring
ancestral information with increasing accuracy as the number of taxa increases.Comment: 32 pages, 5 figures, 1 table
Latent tree models
Latent tree models are graphical models defined on trees, in which only a
subset of variables is observed. They were first discussed by Judea Pearl as
tree-decomposable distributions to generalise star-decomposable distributions
such as the latent class model. Latent tree models, or their submodels, are
widely used in: phylogenetic analysis, network tomography, computer vision,
causal modeling, and data clustering. They also contain other well-known
classes of models like hidden Markov models, Brownian motion tree model, the
Ising model on a tree, and many popular models used in phylogenetics. This
article offers a concise introduction to the theory of latent tree models. We
emphasise the role of tree metrics in the structural description of this model
class, in designing learning algorithms, and in understanding fundamental
limits of what and when can be learned
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model
Maximum likelihood estimators are used extensively to estimate unknown
parameters of stochastic trait evolution models on phylogenetic trees. Although
the MLE has been proven to converge to the true value in the independent-sample
case, we cannot appeal to this result because trait values of different species
are correlated due to shared evolutionary history. In this paper, we consider a
-state symmetric model for a single binary trait and investigate the
theoretical properties of the MLE for the transition rate in the large-tree
limit. Here, the large-tree limit is a theoretical scenario where the number of
taxa increases to infinity and we can observe the trait values for all species.
Specifically, we prove that the MLE converges to the true value under some
regularity conditions. These conditions ensure that the tree shape is not too
irregular, and holds for many practical scenarios such as trees with bounded
edges, trees generated from the Yule (pure birth) process, and trees generated
from the coalescent point process. Our result also provides an upper bound for
the distance between the MLE and the true value
- …