8,662 research outputs found

    Classification of molecular sequence data using Bayesian phylogenetic mixture models

    Get PDF
    Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and site classification; the steppingstone sampler (SS) is used to select amongst competing mixture models. Example applications of simulated data and mitochondrial DNA of primates illustrate site classification via ‘augmented’  Bayesian phylogenetic mixtures. In both examples, all mixtures outperform commonly-used models of among-site rate variation and models that do not account for rate heterogeneity. The examples further demonstrate how site classification is readily available from the analysis output. The method is directly relevant to the choice of partitions in Bayesian phylogenetics, and its application may lead to the discovery of structure not otherwise recognised in a molecular sequence alignment. Computational aspects of Bayesian phylogenetic model estimation are discussed, including the use of simple Markov chain Monte Carlo (MCMC) moves that mix efficiently without tempering the chains. The contribution to the field of Bayesian phylogenetics is in (1) the use of mixture models augmented with allocation variables as tools for site classification and quantification of classification uncertainty, (2) the successful application of SS for selection of phylogenetic mixtures, and (3) the development of novel MCMC aspects of relevance to Bayesian phylogenetic models—whether mixtures or not.1&nbsp

    Shotgun Mitogenomics Provides a Reference Phylogenetic Framework and Timescale for Living Xenarthrans

    Get PDF
    Xenarthra (armadillos, sloths, and anteaters) constitutes one of the four major clades of placental mammals. Despite their phylogenetic distinctiveness in mammals, a reference phylogeny is still lacking for the 31 described species. Here we used Illumina shotgun sequencing to assemble 33 new complete mitochondrial genomes, establishing Xenarthra as the first major placental clade to be fully sequenced at the species level for mitogenomes. The resulting data set allowed the reconstruction of a robust phylogenetic framework and timescale that are consistent with previous studies conducted at the genus level using nuclear genes. Incorporating the full species diversity of extant xenarthrans points to a number of inconsistencies in xenarthran systematics and species definition. We propose to split armadillos in two distinct families Dasypodidae (dasypodines) and Chlamyphoridae (euphractines, chlamyphorines, and tolypeutines) to better reflect their ancient divergence, estimated around 42 million years ago. Species delimitation within long-nosed armadillos (genus Dasypus) appeared more complex than anticipated, with the discovery of a divergent lineage in French Guiana. Diversification analyses showed Xenarthra to be an ancient clade with a constant diversification rate through time with a species turnover driven by high but constant extinction. We also detected a significant negative correlation between speciation rate and past temperature fluctuations with an increase in speciation rate corresponding to the general cooling observed during the last 15 million years. Biogeographic reconstructions identified the tropical rainforest biome of Amazonia and the Guianan shield as the cradle of xenarthran evolutionary history with subsequent dispersions into more open and dry habitats.Fil: Gibb, Gillian C.. Universite de Montpellier; Francia. Massey Universit; Nueva ZelandaFil: Condamine, Fabien L.. University of Gothenburg; Suecia. Universite de Montpellier; Francia. University of Alberta; CanadáFil: Kuch, Melanie. McMaster University; CanadáFil: Enk, Jacob. McMaster University; CanadáFil: Moraes Barros, Nadia. Universidade Do Porto; Portugal. Universidade de Sao Paulo; BrasilFil: Superina, Mariella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Poinar, Hendrik N.. McMaster University; CanadáFil: Delsuc, Frederic. Universite de Montpellier; Franci

    A stable backbone for the fungi

    Get PDF
    Fungi are abundant in the biosphere. They have fascinated mankind as far as written history goes and have considerably influenced our culture. In biotechnology, cell biology, genetics, and life sciences in general fungi constitute relevant model organisms. Once the phylogenetic relationships of fungi are stably resolved individual results from fungal research can be combined into a holistic picture of biology. However, and despite recent progress, the backbone of the fungal phylogeny is not yet fully resolved. Especially the early evolutionary history of fungi and the order or below-order relationships within the ascomycetes remain uncertain. Here we present the first phylogenomic study for a eukaryotic kingdom that merges all publicly available fungal genomes and expressed sequence tags (EST) to build a data set comprising 128 genes and 146 taxa. The resulting tree provides a stable phylogenetic backbone for the fungi. Moreover, we present the first formal supertree based on 161 fungal taxa and 128 gene trees. The combined evidences from the trees support the deep-level stability of the fungal groups towards a comprehensive natural system of the fungi. They indicate that the classification of the fungi, especially their alliance with the Microsporidia, requires careful revision. Our analysis is also an inventory of present day sequence information for the fungi. It provides insights into which phylogenenetic conclusions can and which cannot be drawn from the current data and may serve as a guide to direct further sequencing initiatives. Together with a comprehensive animal phylogeny, we provide the second of three pillars to understand the evolution of the multicellular eukaryotic kingdoms, fungi, metazoa, and plants, in the past 1.6 billion years

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore