71,552 research outputs found
On the inference of large phylogenies with long branches: How long is too long?
Recent work has highlighted deep connections between sequence-length
requirements for high-probability phylogeny reconstruction and the related
problem of the estimation of ancestral sequences. In [Daskalakis et al.'09],
building on the work of [Mossel'04], a tight sequence-length requirement was
obtained for the CFN model. In particular the required sequence length for
high-probability reconstruction was shown to undergo a sharp transition (from
to , where is the number of leaves) at the
"critical" branch length \critmlq (if it exists) of the ancestral
reconstruction problem.
Here we consider the GTR model. For this model, recent results of [Roch'09]
show that the tree can be accurately reconstructed with sequences of length
when the branch lengths are below \critksq, known as the
Kesten-Stigum (KS) bound. Although for the CFN model \critmlq = \critksq, it
is known that for the more general GTR models one has \critmlq \geq \critksq
with a strict inequality in many cases. Here, we show that this phenomenon also
holds for phylogenetic reconstruction by exhibiting a family of symmetric
models and a phylogenetic reconstruction algorithm which recovers the tree
from -length sequences for some branch lengths in the range
(\critksq,\critmlq). Second we prove that phylogenetic reconstruction under
GTR models requires a polynomial sequence-length for branch lengths above
\critmlq
Global Alignment of Molecular Sequences via Ancestral State Reconstruction
Molecular phylogenetic techniques do not generally account for such common
evolutionary events as site insertions and deletions (known as indels). Instead
tree building algorithms and ancestral state inference procedures typically
rely on substitution-only models of sequence evolution. In practice these
methods are extended beyond this simplified setting with the use of heuristics
that produce global alignments of the input sequences--an important problem
which has no rigorous model-based solution. In this paper we consider a new
version of the multiple sequence alignment in the context of stochastic indel
models. More precisely, we introduce the following {\em trace reconstruction
problem on a tree} (TRPT): a binary sequence is broadcast through a tree
channel where we allow substitutions, deletions, and insertions; we seek to
reconstruct the original sequence from the sequences received at the leaves of
the tree. We give a recursive procedure for this problem with strong
reconstruction guarantees at low mutation rates, providing also an alignment of
the sequences at the leaves of the tree. The TRPT problem without indels has
been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a
bootstrapping step towards obtaining optimal phylogenetic reconstruction
methods. The present work sets up a framework for extending these works to
evolutionary models with indels
Reconstruction of ancestral protein sequences and its applications
BACKGROUND: Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference. RESULTS: We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity. CONCLUSIONS: As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from
Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees
We establish necessary and sufficient conditions for consistent root
reconstruction in continuous-time Markov models with countable state space on
bounded-height trees. Here a root state estimator is said to be consistent if
the probability that it returns to the true root state converges to 1 as the
number of leaves tends to infinity. We also derive quantitative bounds on the
error of reconstruction. Our results answer a question of Gascuel and Steel and
have implications for ancestral sequence reconstruction in a classical
evolutionary model of nucleotide insertion and deletion.Comment: 30 pages, 3 figures, title of reference [FR] is update
- …