60 research outputs found
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics
The Multiple Sequence Alignment (MSA) is a computational abstraction that
represents a partial summary either of indel history, or of structural
similarity. Taking the former view (indel history), it is possible to use
formal automata theory to generalize the phylogenetic likelihood framework for
finite substitution models (Dayhoff's probability matrices and Felsenstein's
pruning algorithm) to arbitrary-length sequences. In this paper, we report
results of a simulation-based benchmark of several methods for reconstruction
of indel history. The methods tested include a relatively new algorithm for
statistical marginalization of MSAs that sums over a stochastically-sampled
ensemble of the most probable evolutionary histories. For mammalian
evolutionary parameters on several different trees, the single most likely
history sampled by our algorithm appears less biased than histories
reconstructed by other MSA methods. The algorithm can also be used for
alignment-free inference, where the MSA is explicitly summed out of the
analysis. As an illustration of our method, we discuss reconstruction of the
evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with
arXiv:1103.434
A Unifying Model of Genome Evolution Under Parsimony
We present a data structure called a history graph that offers a practical
basis for the analysis of genome evolution. It conceptually simplifies the
study of parsimonious evolutionary histories by representing both substitutions
and double cut and join (DCJ) rearrangements in the presence of duplications.
The problem of constructing parsimonious history graphs thus subsumes related
maximum parsimony problems in the fields of phylogenetic reconstruction and
genome rearrangement. We show that tractable functions can be used to define
upper and lower bounds on the minimum number of substitutions and DCJ
rearrangements needed to explain any history graph. These bounds become tight
for a special type of unambiguous history graph called an ancestral variation
graph (AVG), which constrains in its combinatorial structure the number of
operations required. We finally demonstrate that for a given history graph ,
a finite set of AVGs describe all parsimonious interpretations of , and this
set can be explored with a few sampling moves.Comment: 52 pages, 24 figure
Evaluation of methods for detecting conversion events in gene clusters
Background: Gene clusters are genetically important, but their analysis poses significant computational challenges. One of the major reasons for these difficulties is gene conversion among the duplicated regions of the cluster, which can obscure their true relationships. Many computational methods for detecting gene conversion events have been released, but their performance has not been assessed for wide deployment in evolutionary history studies due to a lack of accurate evaluation methods. Results: We designed a new method that simulates gene cluster evolution, including large-scale events of duplication, deletion, and conversion as well as small mutations. We used this simulation data to evaluate several different programs for detecting gene conversion events. Conclusions: Our evaluation identifies strengths and weaknesses of several methods for detecting gene conversion, which can contribute to more accurate analysis of gene cluster evolution
AAV ancestral reconstruction library enables selection of broadly infectious viral variants
Adeno-associated virus (AAV) vectors have achieved clinical efficacy in treating several diseases. Enhanced vectors are required to extend these landmark successes to other indications, however, and protein engineering approaches may provide the necessary vector improvements to address such unmet medical needs. To generate new capsid variants with potentially enhanced infectious properties, and to gain insights into AAV’s evolutionary history, we computationally designed and experimentally constructed a putative ancestral AAV library. Combinatorial variations at 32 amino acid sites were introduced to account for uncertainty in their identities. We then analyzed the evolutionary flexibility of these residues, the majority of which have not been previously studied, by subjecting the library to iterative selection on a representative cell line panel. The resulting variants exhibited transduction efficiencies comparable to the most efficient extant serotypes, and in general ancestral libraries were broadly infectious across the cell line panel, indicating that they favored promiscuity over specificity. Interestingly, putative ancestral AAVs were more thermostable than modern serotypes and did not utilize sialic acids, galactose, or heparan sulfate proteoglycans for cellular entry. Finally, variants mediated 19–31 fold higher gene expression in muscle compared to AAV1, a clinically utilized serotype for muscle delivery, highlighting their promise for gene therapy
- …