60 research outputs found

    Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

    Get PDF
    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

    A Unifying Model of Genome Evolution Under Parsimony

    Get PDF
    We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph GG, a finite set of AVGs describe all parsimonious interpretations of GG, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

    Evaluation of methods for detecting conversion events in gene clusters

    Get PDF
    Background: Gene clusters are genetically important, but their analysis poses significant computational challenges. One of the major reasons for these difficulties is gene conversion among the duplicated regions of the cluster, which can obscure their true relationships. Many computational methods for detecting gene conversion events have been released, but their performance has not been assessed for wide deployment in evolutionary history studies due to a lack of accurate evaluation methods. Results: We designed a new method that simulates gene cluster evolution, including large-scale events of duplication, deletion, and conversion as well as small mutations. We used this simulation data to evaluate several different programs for detecting gene conversion events. Conclusions: Our evaluation identifies strengths and weaknesses of several methods for detecting gene conversion, which can contribute to more accurate analysis of gene cluster evolution

    AAV ancestral reconstruction library enables selection of broadly infectious viral variants

    Full text link
    Adeno-associated virus (AAV) vectors have achieved clinical efficacy in treating several diseases. Enhanced vectors are required to extend these landmark successes to other indications, however, and protein engineering approaches may provide the necessary vector improvements to address such unmet medical needs. To generate new capsid variants with potentially enhanced infectious properties, and to gain insights into AAV’s evolutionary history, we computationally designed and experimentally constructed a putative ancestral AAV library. Combinatorial variations at 32 amino acid sites were introduced to account for uncertainty in their identities. We then analyzed the evolutionary flexibility of these residues, the majority of which have not been previously studied, by subjecting the library to iterative selection on a representative cell line panel. The resulting variants exhibited transduction efficiencies comparable to the most efficient extant serotypes, and in general ancestral libraries were broadly infectious across the cell line panel, indicating that they favored promiscuity over specificity. Interestingly, putative ancestral AAVs were more thermostable than modern serotypes and did not utilize sialic acids, galactose, or heparan sulfate proteoglycans for cellular entry. Finally, variants mediated 19–31 fold higher gene expression in muscle compared to AAV1, a clinically utilized serotype for muscle delivery, highlighting their promise for gene therapy
    corecore