826 research outputs found
Global Alignment of Molecular Sequences via Ancestral State Reconstruction
Molecular phylogenetic techniques do not generally account for such common
evolutionary events as site insertions and deletions (known as indels). Instead
tree building algorithms and ancestral state inference procedures typically
rely on substitution-only models of sequence evolution. In practice these
methods are extended beyond this simplified setting with the use of heuristics
that produce global alignments of the input sequences--an important problem
which has no rigorous model-based solution. In this paper we consider a new
version of the multiple sequence alignment in the context of stochastic indel
models. More precisely, we introduce the following {\em trace reconstruction
problem on a tree} (TRPT): a binary sequence is broadcast through a tree
channel where we allow substitutions, deletions, and insertions; we seek to
reconstruct the original sequence from the sequences received at the leaves of
the tree. We give a recursive procedure for this problem with strong
reconstruction guarantees at low mutation rates, providing also an alignment of
the sequences at the leaves of the tree. The TRPT problem without indels has
been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a
bootstrapping step towards obtaining optimal phylogenetic reconstruction
methods. The present work sets up a framework for extending these works to
evolutionary models with indels
Recommended from our members
Reconstructing an ancestral genotype of two hexachlorocyclohexane-degrading Sphingobium species using metagenomic sequence data.
Over the last 60 years, the use of hexachlorocyclohexane (HCH) as a pesticide has resulted in the production of >4 million tons of HCH waste, which has been dumped in open sinks across the globe. Here, the combination of the genomes of two genetic subspecies (Sphingobium japonicum UT26 and Sphingobium indicum B90A; isolated from two discrete geographical locations, Japan and India, respectively) capable of degrading HCH, with metagenomic data from an HCH dumpsite (âŒ450âmg HCH per g soil), enabled the reconstruction and validation of the last-common ancestor (LCA) genotype. Mapping the LCA genotype (3128 genes) to the subspecies genomes demonstrated that >20% of the genes in each subspecies were absent in the LCA. This includes two enzymes from the 'upper' HCH degradation pathway, suggesting that the ancestor was unable to degrade HCH isomers, but descendants acquired lin genes by transposon-mediated lateral gene transfer. In addition, anthranilate and homogentisate degradation traits were found to be strain (selectively retained only by UT26) and environment (absent in the LCA and subspecies, but prevalent in the metagenome) specific, respectively. One draft secondary chromosome, two near complete plasmids and eight complete lin transposons were assembled from the metagenomic DNA. Collectively, these results reinforce the elastic nature of the genus Sphingobium, and describe the evolutionary acquisition mechanism of a xenobiotic degradation phenotype in response to environmental pollution. This also demonstrates for the first time the use of metagenomic data in ancestral genotype reconstruction, highlighting its potential to provide significant insight into the development of such phenotypes
The genome of the medieval Black Death agent (extended abstract)
The genome of a 650 year old Yersinia pestis bacteria, responsible for the
medieval Black Death, was recently sequenced and assembled into 2,105 contigs
from the main chromosome. According to the point mutation record, the medieval
bacteria could be an ancestor of most Yersinia pestis extant species, which
opens the way to reconstructing the organization of these contigs using a
comparative approach. We show that recent computational paleogenomics methods,
aiming at reconstructing the organization of ancestral genomes from the
comparison of extant genomes, can be used to correct, order and complete the
contig set of the Black Death agent genome, providing a full chromosome
sequence, at the nucleotide scale, of this ancient bacteria. This sequence
suggests that a burst of mobile elements insertions predated the Black Death,
leading to an exceptional genome plasticity and increase in rearrangement rate.Comment: Extended abstract of a talk presented at the conference JOBIM 2013,
https://colloque.inra.fr/jobim2013_eng/. Full paper submitte
ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process.
Modern phylogenetic methods allow inference of ancestral molecular sequences given an alignment and phylogeny relating present day sequences. This provides insight into the evolutionary history of molecules, helping to understand gene function and to study biological processes such as adaptation and convergent evolution across a variety of applications. Here we propose a dynamic programming algorithm for fast joint likelihood-based reconstruction of ancestral sequences under the Poisson Indel Process (PIP). Unlike previous approaches, our method, named ARPIP, enables the reconstruction with insertions and deletions based on an explicit indel model. Consequently, inferred indel events have an explicit biological interpretation. Likelihood computation is achieved in linear time with respect to the number of sequences. Our method consists of two steps, namely finding the most probable indel points and reconstructing ancestral sequences. First, we find the most likely indel points and prune the phylogeny to reflect the insertion and deletion events per site. Second, we infer the ancestral states on the pruned subtree in a manner similar to FastML. We applied ARPIP on simulated datasets and on real data from the Betacoronavirus genus. ARPIP reconstructs both the indel events and substitutions with a high degree of accuracy. Our method fares well when compared to established state-of-the-art methods such as FastML and PAML. Moreover, the method can be extended to explore both optimal and suboptimal reconstructions, include rate heterogeneity through time and more. We believe it will expand the range of novel applications of ancestral sequence reconstruction
Probabilistic Graphical Model Representation in Phylogenetics
Recent years have seen a rapid expansion of the model space explored in
statistical phylogenetics, emphasizing the need for new approaches to
statistical model representation and software development. Clear communication
and representation of the chosen model is crucial for: (1) reproducibility of
an analysis, (2) model development and (3) software design. Moreover, a
unified, clear and understandable framework for model representation lowers the
barrier for beginners and non-specialists to grasp complex phylogenetic models,
including their assumptions and parameter/variable dependencies.
Graphical modeling is a unifying framework that has gained in popularity in
the statistical literature in recent years. The core idea is to break complex
models into conditionally independent distributions. The strength lies in the
comprehensibility, flexibility, and adaptability of this formalism, and the
large body of computational work based on it. Graphical models are well-suited
to teach statistical models, to facilitate communication among phylogeneticists
and in the development of generic software for simulation and statistical
inference.
Here, we provide an introduction to graphical models for phylogeneticists and
extend the standard graphical model representation to the realm of
phylogenetics. We introduce a new graphical model component, tree plates, to
capture the changing structure of the subgraph corresponding to a phylogenetic
tree. We describe a range of phylogenetic models using the graphical model
framework and introduce modules to simplify the representation of standard
components in large and complex models. Phylogenetic model graphs can be
readily used in simulation, maximum likelihood inference, and Bayesian
inference using, for example, Metropolis-Hastings or Gibbs sampling of the
posterior distribution
Recoverability of Ancestral Recombination Graph Topologies
Recombination is a powerful evolutionary process that shapes the genetic
diversity observed in the populations of many species. Reconstructing
genealogies in the presence of recombination from sequencing data is a very
challenging problem, as this relies on mutations having occurred on the correct
lineages in order to detect the recombination and resolve the placement of
edges in the local trees. We investigate the probability of recovering the true
topology of ancestral recombination graphs (ARGs)under the coalescent with
recombination and gene conversion. We explore how sample size and mutation rate
affect the inherent uncertainty in reconstructed ARGs; this sheds light on the
theoretical limitations of ARG reconstruction methods. We illustrate our
results using estimates of evolutionary rates for several biological organisms;
in particular, we find that for parameter values that are realistic for
SARS-CoV-2, the probability of reconstructing genealogies that are close to the
truth is low
- âŠ