21 research outputs found

    FossilSim:An r package for simulating fossil occurrence data under mechanistic models of preservation and recovery

    Get PDF
    1.Key features of the fossil record that present challenges for integrating palaeontological and phylogenetic datasets include (i) non‐uniform fossil recovery, (ii) stratigraphic age uncertainty and (iii) inconsistencies in the definition of species origination and taxonomy. 2.We present an r package FossilSim that can be used to simulate and visualise fossil data for phylogenetic analysis under a range of flexible models. The package includes interval‐, environment‐ and lineage‐dependent models of fossil recovery that can be combined with models of stratigraphic age uncertainty and species evolution. 3.The package input and output can be used in combination with the wide range of existing phylogenetic and palaeontological r packages. We also provide functions for converting between FossilSim and paleotree objects. 4. Simulated datasets provide enormous potential to assess the performance of phylogenetic methods and to explore the impact of using fossil occurrence databases on parameter estimation in macroevolution.ISSN:2041-210XISSN:2041-209

    BEAST 2.5:An advanced software platform for Bayesian evolutionary analysis

    Get PDF
    Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release

    Complex birth-death models for Bayesian phylodynamic inferences

    No full text
    Phylogenetic trees show the evolutionary relationships between individuals, populations or species and are generally built from genetic sequences. Phylodynamic inference focuses on reconstructing the underlying evolutionary processes from a phylogenetic tree, and can infer biologically meaningful parameters such as the rate of transmission of a pathogen or the rate of extinction of certain species. Its applications thus range from tracking the spread of epidemics to evaluating the impact of environmental conditions on the diversification process. Birth-death models are one of the main categories of models used for phylodynamic inference. This thesis presents work realized on two important types of birth-death models, the multi-state model for structured populations and the fossilized birth-death process. Chapter 1 presents an overview of Bayesian phylodynamic inference and its applications as well as birth-death models. In Chapter 2, I introduce a new multi-state birth-death (MSBD) model which can be used to study variations in birth and death rates across a phylogenetic tree. I show that this model can reliably infer these rates on both simulated and empirical datasets. Chapter 3 shows an application of the MSBD model to the detection of transmission clusters in HIV transmission networks, for which I show that it performs better than existing cutpoint-based methods. Chapter 4 presents an R package for simulating fossil and taxonomy datasets, which can be used to test and validate existing or future birth-death models integrating fossils. An application of this package is shown in Chapter 5, where I compare several different methods of handling fossil age uncertainty and evaluate their impact on the accuracy of the estimates. In particular, I show that commonly used methods of simplifying the data by disregarding the age uncertainty lead to strong biases in the resulting inference. In Chapter 6, I present a series of workshops and an online knowledge repository I have contributed to, which are designed to help users of Bayesian phylodynamic inference via the software BEAST2 make the best choices for their own datasets. Indeed, as more complex models are developed, communication between users and developers is increasingly crucial. Finally, in Chapter 7, I discuss the methods developed in this thesis and suggest directions for future research

    A Multitype Birth-Death Model for Bayesian Inference of Lineage-Specific Birth and Death Rates

    No full text
    Heterogeneous populations can lead to important differences in birth and death rates across a phylogeny. Taking this heterogeneity into account is necessary to obtain accurate estimates of the underlying population dynamics. We present a new multi-type birth-death model (MTBD) that can estimate lineage-specific birth and death rates. This corresponds to estimating lineage-dependent speciation and extinction rates for species phylogenies, and lineage-dependent transmission and recovery rates for pathogen transmission trees. In contrast with previous models, we do not presume to know the trait driving the rate differences, nor do we prohibit the same rates from appearing in different parts of the phylogeny. Using simulated datasets, we show that the MTBD model can reliably infer the presence of multiple evolutionary regimes, their positions in the tree, and the birth and death rates associated with each. We also present a re-analysis of two empirical datasets and compare the results obtained by MTBD and by the existing software BAMM. We compare two implementations of the model, one exact and one approximate (assuming that no rate changes occur in the extinct parts of the tree), and show that the approximation only slightly affects results. The MTBD model is implemented as a package in the Bayesian inference software BEAST~2, and allows joint inference of the phylogeny and the model parameters.Files contained in this dataset: Supplement.pdf: Supplementary methods and results code_files.zip: R scripts used to simulate, process and analyze the data and make the plots; XML configuration files used to run BEAST2 data_files.zip: Simulated datasets and summary of the result

    Detection of HIV transmission clusters from phylogenetic trees using a multi-state birth–death model

    No full text
    HIV patients form clusters in HIV transmission networks. Accurate identification of these transmission clusters is essential to effectively target public health interventions. One reason for clustering is that the underlying contact network contains many local communities. We present a new maximum-likelihood method for identifying transmission clusters caused by community structure, based on phylogenetic trees. The method employs a multi-state birth–death (MSBD) model which detects changes in transmission rate, which are interpreted as the introduction of the epidemic into a new susceptible community, i.e. the formation of a new cluster. We show that the MSBD method is able to reliably infer the clusters and the transmission parameters from a pathogen phylogeny based on our simulations. In contrast to existing cutpoint-based methods for cluster identification, our method does not require that clusters be monophyletic nor is it dependent on the selection of a difficult-to-interpret cutpoint parameter. We present an application of our method to data from the Swiss HIV Cohort Study. The method is available as an easy-to-use R package.ISSN:1742-5689ISSN:1742-566

    Robust Phylodynamic Analysis of Genetic Sequencing Data from Structured Populations

    Get PDF
    The multi-type birth–death model with sampling is a phylodynamic model which enables the quantification of past population dynamics in structured populations based on phylogenetic trees. The BEAST 2 package bdmm implements an algorithm for numerically computing the probability density of a phylogenetic tree given the population dynamic parameters under this model. In the initial release of bdmm, analyses were computationally limited to trees consisting of up to approximately 250 genetic samples. We implemented important algorithmic changes to bdmm which dramatically increased the number of genetic samples that could be analyzed and which improved the numerical robustness and efficiency of the calculations. Including more samples led to the improved precision of parameter estimates, particularly for structured models with a high number of inferred parameters. Furthermore, we report on several model extensions to bdmm, inspired by properties common to empirical datasets. We applied this improved algorithm to two partly overlapping datasets of the Influenza A virus HA sequences sampled around the world – one with 500 samples and the other with only 175 – for comparison. We report and compare the global migration patterns and seasonal dynamics inferred from each dataset. In this way, we show the information that is gained by analyzing the bigger dataset, which became possible with the presented algorithmic changes to bdmm. In summary, bdmm allows for the robust, faster, and more general phylodynamic inference of larger datasets.ISSN:1999-491

    Supplementary materials for "Putting the F into FBD analysis: tree constraints or morphological data ?"

    No full text
    <p>This dataset contains the R code, simulated datasets and configuration files used to run the analyses presented in the manuscript "Putting the F into FBD analysis: tree constraints or morphological data ?".</p&gt

    Ignoring stratigraphic age uncertainty leads to erroneous estimates of species divergence times under the fossilized birth–death process

    Get PDF
    Fossil information is essential for estimating species divergence times, and can be integrated into Bayesian phylogenetic inference using the fossilized birth–death (FBD) process. An important aspect of palaeontological data is the uncertainty surrounding specimen ages, which can be handled in different ways during inference. The most common approach is to fix fossil ages to a point estimate within the known age interval. Alternatively, age uncertainty can be incorporated by using priors, and fossil ages are then directly sampled as part of the inference. This study presents a comparison of alternative approaches for handling fossil age uncertainty in analysis using the FBD process. Based on simulations, we find that fixing fossil ages to the midpoint or a random point drawn from within the stratigraphic age range leads to biases in divergence time estimates, while sampling fossil ages leads to estimates that are similar to inferences that employ the correct ages of fossils. Second, we show a comparison using an empirical dataset of extant and fossil cetaceans, which confirms that different methods of handling fossil age uncertainty lead to large differences in estimated node ages. Stratigraphic age uncertainty should thus not be ignored in divergence time estimation and instead should be incorporated explicitly
    corecore