8,254 research outputs found

    Estimating selection pressures on HIV-1 using phylogenetic likelihood models

    Get PDF
    Human immunodeficiency virus (HIV-1) can rapidly evolve due to selection pressures exerted by HIV-specific immune responses, antiviral agents, and to allow the virus to establish infection in different compartments in the body. Statistical models applied to HIV-1 sequence data can help to elucidate the nature of these selection pressures through comparisons of non-synonymous (or amino acid changing) and synonymous (or amino acid preserving) substitution rates. These models also need to take into account the non-independence of sequences due to their shared evolutionary history. We review how we have developed these methods and have applied them to characterize the evolution of HIV-1 in vivo.To illustrate our methods, we present an analysis of compartment-specific evolution of HIV-1 env in blood and cerebrospinal fluid and of site-to-site variation in the gag gene of subtype C HIV-1

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Phylogenetic mixtures: Concentration of measure in the large-tree limit

    Get PDF
    The reconstruction of phylogenies from DNA or protein sequences is a major task of computational evolutionary biology. Common phenomena, notably variations in mutation rates across genomes and incongruences between gene lineage histories, often make it necessary to model molecular data as originating from a mixture of phylogenies. Such mixed models play an increasingly important role in practice. Using concentration of measure techniques, we show that mixtures of large trees are typically identifiable. We also derive sequence-length requirements for high-probability reconstruction.Comment: Published in at http://dx.doi.org/10.1214/11-AAP837 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages

    Full text link
    One reason why classical phylogenetic reconstruction methods fail to correctly infer the underlying topology is because they assume oversimplified models. In this paper we propose a topology reconstruction method consistent with the most general Markov model of nucleotide substitution, which can also deal with data coming from mixtures on the same topology. It is based on an idea of Eriksson on using phylogenetic invariants and provides a system of weights that can be used as input of quartet-based methods. We study its performance on real data and on a wide range of simulated 4-taxon data (both time-homogeneous and nonhomogeneous, with or without among-site rate heterogeneity, and with different branch length settings). We compare it to the classical methods of neighbor-joining (with paralinear distance), maximum likelihood (with different underlying models), and maximum parsimony. Our results show that this method is accurate and robust, has a similar performance to ML when data satisfies the assumptions of both methods, and outperforms all methods when these are based on inappropriate substitution models or when both long and short branches are present. If alignments are long enough, then it also outperforms other methods when some of its assumptions are violated.Comment: 32 pages; 9 figure

    Molecular phylogeny of brachiopods and phoronids based on nuclear-encoded small subunit ribosomal RNA gene sequences

    Get PDF
    Brachiopod and phoronid phylogeny is inferred from SSU rDNA sequences of 28 articulate and nine inarticulate brachiopods, three phoronids, two ectoprocts and various outgroups, using gene trees reconstructed by weighted parsimony, distance and maximum likelihood methods. Of these sequences, 33 from brachiopods, two from phoronids and one each from an ectoproct and a priapulan are newly determined. The brachiopod sequences belong to 31 different genera and thus survey about 10% of extant genus-level diversity. Sequences determined in different laboratories and those from closely related taxa agree well, but evidence is presented suggesting that one published phoronid sequence (GenBank accession UO12648) is a brachiopod-phoronid chimaera, and this sequence is excluded from the analyses. The chiton, Acanthopleura, is identified as the phenetically proximal outgroup; other selected outgroups were chosen to allow comparison with recent, non-molecular analyses of brachiopod phylogeny. The different outgroups and methods of phylogenetic reconstruction lead to similar results, with differences mainly in the resolution of weakly supported ancient and recent nodes, including the divergence of inarticulate brachiopod sub-phyla, the position of the rhynchonellids in relation to long- and short-looped articulate brachiopod clades and the relationships of some articulate brachiopod genera and species. Attention is drawn to the problem presented by nodes that are strongly supported by non-molecular evidence but receive only low bootstrap resampling support. Overall, the gene trees agree with morphology-based brachiopod taxonomy, but novel relationships are tentatively suggested for thecideidine and megathyrid brachiopods. Articulate brachiopods are found to be monophyletic in all reconstructions, but monophyly of inarticulate brachiopods and the possible inclusion of phoronids in the inarticulate brachiopod clade are less strongly established. Phoronids are clearly excluded from a sister-group relationship with articulate brachiopods, this proposed relationship being due to the rejected, chimaeric sequence (GenBank UO12648). Lineage relative rate tests show no heterogeneity of evolutionary rate among articulate brachiopod sequences, but indicate that inarticulate brachiopod plus phoronid sequences evolve somewhat more slowly. Both brachiopods and phoronids evolve slowly by comparison with other invertebrates. A number of palaeontologically dated times of earliest appearance are used to make upper and lower estimates of the global rate of brachiopod SSU rDNA evolution, and these estimates are used to infer the likely divergence times of other nodes in the gene tree. There is reasonable agreement between most inferred molecular and palaeontological ages. The estimated rates of SSU rDNA sequence evolution suggest that the last common ancestor of brachiopods, chitons and other protostome invertebrates (Lophotrochozoa and Ecdysozoa) lived deep in Precambrian time. Results of this first DNA-based, taxonomically representative analysis of brachiopod phylogeny are in broad agreement with current morphology-based classification and systematics and are largely consistent with the hypothesis that brachiopod shell ontogeny and morphology are a good guide to phylogeny

    The identifiability of tree topology for phylogenetic models, including covarion and mixture models

    Full text link
    For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.Comment: 20 pages, 1 figur
    corecore