2,314 research outputs found

    PhyLIS: A Simple GNU/Linux Distribution for Phylogenetics and Phyloinformatics

    Get PDF
    PhyLIS is a free GNU/Linux distribution that is designed to provide a simple, standardized platform for phylogenetic and phyloinformatic analysis. The operating system incorporates most commonly used phylogenetic software, which has been pre-compiled and pre-configured, allowing for straightforward application of phylogenetic methods and development of phyloinformatic pipelines in a stable Linux environment. The software is distributed as a live CD and can be installed directly or run from the CD without making changes to the computer. PhyLIS is available for free at http://www.eve.ucdavis.edu/rcthomson/phylis/

    The behavior of metropolis-coupled Markov chains when sampling rugged phylogenetic distributions

    Get PDF
    Β© The Author(s) 2018. Bayesian phylogenetic inference relies on the use of Markov chain Monte Carlo (MCMC) to provide numerical approximations of high-dimensional integrals and estimate posterior probabilities. However, MCMC performs poorly when posteriors are very rugged (i.e., regions of high posterior density are separated by regions of low posterior density). One technique that has become popular for improving numerical estimates from MCMC when distributions are rugged is Metropolis coupling (MC3). InMC3, additional chains are employed to sample flattened transformations of the posterior and improve mixing. Here, we highlight several underappreciated behaviors of MC3. Notably, estimated posterior probabilities may be incorrect but appear to converge, when individual chains do not mixwell, despite different chains sampling trees from all relevant areas in tree space. Counter intuitively, such behavior can be more difficult to diagnose with increased numbers of chains. We illustrate these surprising behaviors of MC3 using a simple, non-phylogenetic example and phylogenetic examples involving both constrained and unconstrained analyses. To detect and mitigate the effects of these behaviors, we recommend increasing the number of independent analyses and varying the temperature of the hottest chain in current versions of Bayesian phylogenetic software. Convergence diagnostics based on the behavior of the hottest chain may also help detect these behaviors and could form a useful addition to future software releases

    On the Need for New Measures of Phylogenomic Support

    Get PDF
    The scale of data sets used to infer phylogenies has grown dramatically in the last decades, providing researchers with an enormous amount of information with which to draw inferences about evolutionary history. However, standard approaches to assessing confidence in those inferences (e.g., nonparametric bootstrap proportions [BP] and Bayesian posterior probabilities [PPs]) are still deeply influenced by statistical procedures and frameworks that were developed when information was much more limited. These approaches largely quantify uncertainty caused by limited amounts of data, which is often vanishingly small with modern, genome-scale sequence data sets. As a consequence, today\u27s phylogenomic studies routinely report near-complete confidence in their inferences, even when different studies reach strongly conflicting conclusions and the sites and loci in a single data set contain much more heterogeneity than our methods assume or can accommodate. Therefore, we argue that BPs and marginal PPs of bipartitions have outlived their utility as the primary means of measuring phylogenetic support for modern phylogenomic data sets with large numbers of sites relative to the number of taxa. Continuing to rely on these measures will hinder progress towards understanding remaining sources of uncertainty in the most challenging portions of the Tree of Life. Instead, we encourage researchers to examine the ideas and methods presented in this special issue of Systematic Biology and to explore the area further in their own work. The papers in this special issue outline strategies for assessing confidence and uncertainty in phylogenomic data sets that move beyond stochastic error due to limited data and offer promise for more productive dialogue about the challenges that we face in reaching our shared goal of understanding the history of life on Earth.[Big data; gene tree variation; genomic era; statistical bias.

    Coral Growth Related to Resuspension of Bottom Sediments

    Get PDF
    The determination of coral skeletal growth, from both a biological and a geological standpoint, has long presented difficulties in ease and accuracy of measurement. As a consequence, factors which limit growth are only beginning to be investigated. Radiometric methods using natural radioactive series nuclides have recently been used to determine the growth rate of certain corals. In addition, X radiographs coupled with 90Sr-induced autoradiographs have demonstrated that at least some hermatypic corals record annual growth bands in their skeletons. Once annual banding is confirmed, measurement of band widths allows precise determination of yearly growth increments. It is thus possible to compare inter- and intraspecific coral growth rates from different areas with pertinent environmental variables and to educe possible correlations. We describe here an investigation of the effects of resuspension of bottom sediments on the growth rate, as determined by a 228Ra technique and X radiography of Montastrea annularis in different parts of Discovery Bay, Jamaica. Such a study is unique both because the annual nature of growth bands is established for the most important reef-forming coral in the Caribbean and because radiometric and radiographic techniques are extended to measure growth rates as a function of environmental parameters

    Assessing what is needed to resolve a molecular phylogeny: simulations and empirical data from emydid turtles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies often contain both well-supported and poorly supported nodes. Determining how much additional data might be required to eventually recover most or all nodes with high support is an important pragmatic goal, and simulations have been used to examine this question. Most simulations have been based on few empirical loci, and suggest that well supported phylogenies can be determined with a very modest amount of data. Here we report the results of an empirical phylogenetic analysis of all 10 genera and 25 of 48 species of the new world pond turtles (family Emydidae) based on one mitochondrial (1070 base pairs) and seven nuclear loci (5961 base pairs), and a more biologically realistic simulation analysis incorporating variation among gene trees, aimed at determining how much more data might be necessary to recover weakly-supported nodes with strong support.</p> <p>Results</p> <p>Our mitochondrial-based phylogeny was well resolved, and congruent with some previous mitochondrial results. For example, all genera, and all species except <it>Pseudemys concinna</it>, <it>P. peninsularis</it>, and <it>Terrapene carolina </it>were monophyletic with strong support from at least one analytical method. The Emydinae was recovered as monophyletic, but the Deirochelyinae was not. Based on nuclear data, all genera were monophyletic with strong support except <it>Trachemys</it>, and all species except <it>Graptemys pseudogeographica</it>, <it>P. concinna</it>, <it>T. carolina</it>, and <it>T. coahuila </it>were monophyletic, generally with strong support. However, the branches subtending most genera were relatively short, and intergeneric relationships within subfamilies were mostly unsupported.</p> <p>Our simulations showed that relatively high bootstrap support values (i.e. β‰₯ 70) for all nodes were reached in all datasets, but an increase in data did not necessarily equate to an increase in support values. However, simulations based on a single empirical locus reached higher overall levels of support with less data than did the simulations that were based on all seven empirical nuclear loci, and symmetric tree distances were much lower for single versus multiple gene simulation analyses.</p> <p>Conclusion</p> <p>Our empirical results provide new insights into the phylogenetics of the Emydidae, but the short branches recovered deep in the tree also indicate the need for additional work on this clade to recover all intergeneric relationships with confidence and to delimit species for some problematic groups. Our simulation results suggest that moderate (in the few-to-tens of kb range) amounts of data are necessary to recover most emydid relationships with high support values. They also suggest that previous simulations that do not incorporate among-gene tree topological variance probably underestimate the amount of data needed to recover well supported phylogenies.</p

    GLUE: a flexible software system for virus sequence data

    Get PDF
    Background: Virus genome sequences, generated in ever-higher volumes, can provide new scientific insights and inform our responses to epidemics and outbreaks. To facilitate interpretation, such data must be organised and processed within scalable computing resources that encapsulate virology expertise. GLUE (Genes Linked by Underlying Evolution) is a data-centric bioinformatics environment for building such resources. The GLUE core data schema organises sequence data along evolutionary lines, capturing not only nucleotide data but associated items such as alignments, genotype definitions, genome annotations and motifs. Its flexible design emphasises applicability to different viruses and to diverse needs within research, clinical or public health contexts. Results: HCV-GLUE is a case study GLUE resource for hepatitis C virus (HCV). It includes an interactive public web application providing sequence analysis in the form of a maximum-likelihood-based genotyping method, antiviral resistance detection and graphical sequence visualisation. HCV sequence data from GenBank is categorised and stored in a large-scale sequence alignment which is accessible via web-based queries. Whereas this web resource provides a range of basic functionality, the underlying GLUE project can also be downloaded and extended by bioinformaticians addressing more advanced questions. Conclusion: GLUE can be used to rapidly develop virus sequence data resources with public health, research and clinical applications. This streamlined approach, with its focus on reuse, will help realise the full value of virus sequence data

    Hepatitis C and the absence of genomic data in low-income countries: a barrier on the road to elimination?

    Get PDF
    Following the development of highly effective direct acting antiviral (DAA) compounds for the treatment of the hepatitis C virus (HCV), WHO has set out plans for disease eradication by 2030. Many barriers must be surmounted before this can be achieved, including buy-in from governments and policy makers, reduced drug costs, and improved infrastructure for the pathway from diagnosis to treatment. A comprehensive set of guidelines was produced by WHO in 2014, updated in 2016, and they are due to be revised later this year
    • …
    corecore