76 research outputs found

    Geographies and Genealogies: Phylogeographic Simulation and Bayesian Approaches to Statistical Phylogeographic Model Selection

    Get PDF
    A wide class of biogeographic or phylogeographic studies predicts the simultaneous divergence of co-distributed taxa. Typically, a geological event, or a climate-related change in geography, is hypothesized to have structured a broad range of biota, many components of which may only be distantly related to each other. Direct assessment of these predictions is precluded in many studies by the lack or paucity of appropriate fossils for calibration when estimating divergence times in a phylogenetic context. However, even without direct divergence time estimation of all the relevant splits, there might be sufficient information in the data to estimate the probability that these groups diverged simultaneously if the datasets are treated in a parallel, coordinated, and integrated fashion, rather than independently. This study investigates the statistical framework and methods used to address this issue. Most current statistical phylogeographic methods rely on the coalescent as an underlying model. While the coalescent is robust to a range of violations of some of its assumptions, such as the Wright- Fisher demographic model, and, morever, has been elaborated or extended to allow the relaxing of some of its other assumptions, little has been done to assess and quanitfy how violations of these assumptions affect phylogeographic analysis in general, and phylogeographic model selection in particular. One of the major problems in evaluating the performance of phylogeographic methods with respect to their responses or behavior when the assumptions of the coalescent are violated is the lack of a rich or flexible non-coalesccent based spatially-explicit simulation engine. The first chapter of my dissertation is thus focussed on developing and producing such a simulator: a forward-time, agent-based, spatially-explicit simulation program that generates genealogies for multiple loci evolving in populations of multiple sexual diploid species on a spatio-temporally environmentally-heterogenous landscape. The second chapter of the dissertation assesses the performance of an Approximate Bayesian Computation approach to simultaneous divergence time testing model selection. It profiles the performance this approach under a variety of conditions, ranging from ones in which its model assumptions are completely met, to ones in which they are selectively violated in varying degrees. While there currently are no full- or exact-likelihood methods that address this question, under the special controlled circumstances of the study it was possible to adapt an existing program to provide some indication of how a full-likehood method may work in contrast. The third chapter of this work presents a program that simultaneously estimates the divergence time between sister populations of multiple species in parallel. This program uses a Bayesian statistical framework to analyze data from multiple genetic loci, integrating over uncertainty in gene trees, divergence times, and demographic parameters. If limited to two species, the program allows for reverse-jump MCMC to sample from models of different dimensionality with respect to the divergence time, so as to explicitly estimate the posterior probability of simultaneous divergence vs. non-simultaneous divergence

    Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al

    Full text link
    Establishing that a set of population-splitting events occurred at the same time can be a potentially persuasive argument that a common process affected the populations. Oaks et al. (2013) assessed the ability of an approximate-Bayesian method (msBayes) to estimate such a pattern of simultaneous divergence across taxa, to which Hickerson et al. (2014) responded. Both papers agree the method is sensitive to prior assumptions and often erroneously supports shared divergences; the papers differ about the explanation and solution. Oaks et al. (2013) suggested the method's behavior is caused by the strong weight of uniform priors on divergence times leading to smaller marginal likelihoods of models with more divergence-time parameters (Hypothesis 1); they proposed alternative priors to avoid strongly weighted posteriors. Hickerson et al. (2014) suggested numerical approximation error causes msBayes analyses to be biased toward models of clustered divergences (Hypothesis 2); they proposed using narrow, empirical uniform priors. Here, we demonstrate that the approach of Hickerson et al. (2014) does not mitigate the method's tendency to erroneously support models of clustered divergences, and often excludes the true parameter values. Our results also show that the tendency of msBayes analyses to support models of shared divergences is primarily due to Hypothesis 1. This series of papers demonstrate that if our prior assumptions place too much weight in unlikely regions of parameter space such that the exact posterior supports the wrong model of evolutionary history, no amount of computation can rescue our inference. Fortunately, more flexible distributions that accommodate prior uncertainty about parameters without placing excessive weight in vast regions of parameter space with low likelihood increase the method's robustness and power to detect temporal variation in divergences.Comment: 24 pages, 4 figures, 1 table, 14 pages of supporting information with 10 supporting figure

    Incorporating the speciation process into species delimitation

    Get PDF
    The “multispecies” coalescent (MSC) model that underlies many genomic species-delimitation approaches is problematic because it does not distinguish between genetic structure associated with species versus that of populations within species. Consequently, as both the genomic and spatial resolution of data increases, a proliferation of artifactual species results as within-species population lineages, detected due to restrictions in gene flow, are identified as distinct species. The toll of this extends beyond systematic studies, getting magnified across the many disciplines that rely upon an accurate framework of identified species. Here we present the first of a new class of approaches that addresses this issue by incorporating an extended speciation process for species delimitation. We model the formation of population lineages and their subsequent development into independent species as separate processes and provide for a way to incorporate current understanding of the species boundaries in the system through specification of species identities of a subset of population lineages. As a result, species boundaries and within-species lineages boundaries can be discriminated across the entire system, and species identities can be assigned to the remaining lineages of unknown affinities with quantified probabilities. In addition to the identification of species units in nature, the primary goal of species delimitation, the incorporation of a speciation model also allows us insights into the links between population and species-level processes. By explicitly accounting for restrictions in gene flow not only between, but also within, species, we also address the limits of genetic data for delimiting species. Specifically, while genetic data alone is not sufficient for accurate delimitation, when considered in conjunction with other information we are able to not only learn about species boundaries, but also about the tempo of the speciation process itself

    Models of microbiome evolution incorporating host and microbial selection

    Get PDF
    BACKGROUND: Numerous empirical studies suggest that hosts and microbes exert reciprocal selective effects on their ecological partners. Nonetheless, we still lack an explicit framework to model the dynamics of both hosts and microbes under selection. In a previous study, we developed an agent-based forward-time computational framework to simulate the neutral evolution of host-associated microbial communities in a constant-sized, unstructured population of hosts. These neutral models allowed offspring to sample microbes randomly from parents and/or from the environment. Additionally, the environmental pool of available microbes was constituted by fixed and persistent microbial OTUs and by contributions from host individuals in the preceding generation. METHODS: In this paper, we extend our neutral models to allow selection to operate on both hosts and microbes. We do this by constructing a phenome for each microbial OTU consisting of a sample of traits that influence host and microbial fitnesses independently. Microbial traits can influence the fitness of hosts ("host selection") and the fitness of microbes ("trait-mediated microbial selection"). Additionally, the fitness effects of traits on microbes can be modified by their hosts ("host-mediated microbial selection"). We simulate the effects of these three types of selection, individually or in combination, on microbiome diversities and the fitnesses of hosts and microbes over several thousand generations of hosts. RESULTS: We show that microbiome diversity is strongly influenced by selection acting on microbes. Selection acting on hosts only influences microbiome diversity when there is near-complete direct or indirect parental contribution to the microbiomes of offspring. Unsurprisingly, microbial fitness increases under microbial selection. Interestingly, when host selection operates, host fitness only increases under two conditions: (1) when there is a strong parental contribution to microbial communities or (2) in the absence of a strong parental contribution, when host-mediated selection acts on microbes concomitantly. CONCLUSIONS: We present a computational framework that integrates different selective processes acting on the evolution of microbiomes. Our framework demonstrates that selection acting on microbes can have a strong effect on microbial diversities and fitnesses, whereas selection on hosts can have weaker outcomes.This research was supported by funds to QZ and AR from Duke University

    Confirmation of Hemidactylus brookii Gray, 1845 from Borneo

    Get PDF
    Hemidactylus brookii was described by Gray (1845) from collections made by British naval officer, Captain Edward Belcher (1799–1877) and the Earl of Derby, presumably the 14th Earl, Edward Geoffrey Smith Stanley (1799–1868). Collection locations are recorded as “Borneo” and “Australia” respectively. The three syntypes (BMNH 1947.3.6.47–49) have the type locality apparently restricted independently to “Borneo” by Smith, 1935: 89 and Pope, 1935: 460. The species is named for Sir James Brooke (1803–1868), who was known as the “First Rajah of Sarawak” (1842–1867). He was formerly Governor of Sarawak from 1841, and Governor of Labuan and Consul-General to the Sultan of Brunei from 184

    Conservation Status of the Amphibians of Malaysia and Singapore

    Get PDF
    Malaysia and Singapore are two independent countries in southeastern Asia (Fig. 1), situated north of the equator, and enjoying mostly tropical climate (with a hint of seasonality in the north). One part of the former (Peninsular Malaysia) stretches from the southern border of Thailand southward to the narrow Johor Strait that separates it from the island state of Singapore. A second, insular, part of Malaysia lies across the South China Sea on the northern coast of Borneo (see below). Malaysia has a total land area of 328,657 km2 , far exceeding the land area of Singapore (700 km2 ). The term ‘Malay Peninsula’ here will refer to Peninsular Malaysia and Singapore, ‘Malaysia’ to the Malaysian Federation, comprising Peninsular Malaysia, Sabah, Sarawak, and Labuan. The geological history of the landmass has been described by Hutchison and Tan (2009); also see Wong (2011) for a recent synthesis of the biological (including palaeontological) and physical characteristics of the southeastern Asian region as a whole

    NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

    Get PDF
    In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.R.A.V. received support from the CIPRES project (NSF #EF-03314953 to W.P.M.), the FP7 Marie Curie Programme (Call FP7-PEOPLE-IEF-2008—Proposal No. 237046) and, for the NeXML implementation in TreeBASE, the pPOD project (NSF IIS 0629846); P.E.M. and J.S. received support from CIPRES (NSF #EF-0331495, #EF-0715370); M.T.H. was supported by NSF (DEB-ATOL-0732920); X.X. received support from NSERC (Canada) Discovery and RTI grants; W.P.M. received support from an NSERC (Canada) Discovery grant; J.C. received support from a Google Summer of Code 2007 grant; A.P. received support from a Google Summer of Code 2010 grant

    Phylotastic! Making Tree-of-Life Knowledge Accessible, Reusable and Convenient

    Get PDF
    Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. Results: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (www.phylotastic.org), and a server image. Conclusions: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.NESCent (the National Evolutionary Synthesis Center)NSF EF-0905606iPlant Collaborative (NSF) DBI-0735191Biodiversity Synthesis Center (BioSync) of the Encyclopedia of LifeComputer Science

    NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

    Get PDF
    In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML

    Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient

    Full text link
    Abstract Background Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great “Tree of Life” (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user’s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. Results With the aim of building such a “phylotastic” system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website ( http://www.phylotastic.org ), and a server image. Conclusions Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.http://deepblue.lib.umich.edu/bitstream/2027.42/112888/1/12859_2013_Article_5897.pd
    corecore