7 research outputs found
Exploring congruent diversification histories with flexibility and parsimony
Abstract Using phylogenies of present‐day species to estimate diversification rate trajectories—speciation and extinction rates over time—is a challenging task due to non‐identifiability issues. Given a phylogeny, there exists an infinite set of trajectories that result in the same likelihood; this set has been coined a congruence class. Previous work has developed approaches for sampling trajectories within a given congruence class, with the aim to assess the extent to which congruent scenarios can vary from one another. Based on this sampling approach, it has been suggested that rapid changes in speciation or extinction rates are conserved across the class. Reaching such conclusions requires to sample the broadest possible set of distinct trajectories. We introduce a new method for exploring congruence classes that we implement in the R package CRABS. Whereas existing methods constrain either the speciation rate or the extinction rate trajectory, ours provides more flexibility by sampling congruent speciation and extinction rate trajectories simultaneously. This allows covering a more representative set of distinct diversification rate trajectories. We also implement a filtering step that allows selecting the most parsimonious trajectories within a class. We demonstrate the utility of our new sampling strategy using a simulated scenario. Next, we apply our approach to the study of mammalian diversification history. We show that rapid changes in speciation and extinction rates need not be conserved across a congruence class, but that selecting the most parsimonious trajectories shrinks the class to concordant scenarios. Our approach opens new avenues both to truly explore the myriad of potential diversification histories consistent with a given phylogeny, embracing the uncertainty inherent to phylogenetic diversification models, and to select among these different histories. This should help refining our inference of diversification trajectories from extant data
Reconstruction d'haplotypes à partir de génotypes dans des populations multiparentales à l'aide de réseaux de neurones artificiels
National audienceA major goal in population genetics is to predict the genetic history of contemporary populations from sequence data. In experimental and agricultural genetics there are many cases where multiple founders (of known genotypes) are combined to produce recombinant progeny. For each descendant, reconstructing its haplotype means finding which regions along the genome descend from which founder.Currently, the best haplotype imputation algorithms are based on Hidden Markov Models. Our goal is to explore the potential and robustness of Neural Networks for haplotype reconstruction, and secondarily for predicting the number of recombination events. We worked on simulations before applying our model to the C. elegans multiparental experimental evolution (CeMEE) dataset, with the Single Nucleotide Polymorphisms (SNPs) of the chromosome I as a case study. You can find the associated notebook on https://github.com/Jeremy-Andreoletti/Haplotype_imputation.Un objectif majeur en génétique des populations consiste à prédire l’histoire génétique des populations contemporaines à partir de données de séquençage. En génétique expérimentale et en agriculture, il existe de nombreux cas où plusieurs individus dits fondateurs (de génotypes connus) sont croisés afin de produire des descendants recombinés. Reconstruire leurs haplotypes signifie pour chaque descendant trouver de quel fondateur proviennent les différentes régions de son génome.Actuellement, les meilleurs algorithmes d'imputation d'haplotypes sont basés sur des modèles de chaînes de Markov cachés. Notre objectif est d'explorer le potentiel et la robustesse des réseaux de neurones pour la reconstruction d'haplotypes et, secondairement, de prédire le nombre d'événements de recombinaison. Nous avons travaillé sur des simulations avant d'appliquer notre modèle au jeu de données d'évolution expérimentale multiparental de C. elegans (CeMEE), avec les polymorphismes nucléotidiques (SNP) du chromosome I comme étude de cas. Vous pouvez retrouver les notebooks associés sur https://github.com/Jeremy-Andreoletti/Haplotype_imputation
The Occurrence Birth-Death Process for combined-evidence analysis in macroevolution and epidemiology
Phylodynamic models generally aim at jointly inferring phylogenetic relationships, model parameters, and more recently, the number of lineages through time, based on molecular sequence data. In the fields of epidemiology and macroevolution these models can be used to estimate, respectively, the past number of infected individuals (prevalence) or the past number of species (paleodiversity) through time. Recent years have seen the development of “total-evidence” analyses, which combine molecular and morphological data from extant and past sampled individuals in a unified Bayesian inference framework. Even sampled individuals characterized only by their sampling time, i.e. lacking morphological and molecular data, which we call occurrences, provide invaluable information to estimate the past number of lineages.
Here, we present new methodological developments around the Fossilized Birth-Death Process enabling us to (i) incorporate occurrence data in the likelihood function; (ii) consider piecewise-constant birth, death and sampling rates; and (iii) estimate the past number of lineages, with or without knowledge of the underlying tree. We implement our method in the RevBayes software environment, enabling its use along with a large set of models of molecular and morphological evolution, and validate the inference workflow using simulations under a wide range of conditions.
We finally illustrate our new implementation using two empirical datasets stemming from the fields of epidemiology and macroevolution. In epidemiology, we infer the prevalence of the COVID-19 outbreak on the Diamond Princess ship, by taking into account jointly the case count record (occurrences) along with viral sequences for a fraction of infected individuals. In macroevolution, we infer the diversity trajectory of cetaceans using molecular and morphological data from extant taxa, morphological data from fossils, as well as numerous fossil occurrences. The joint modeling of occurrences and trees holds the promise to further bridge the gap between between traditional epidemiology and pathogen genomics, as well as paleontology and molecular phylogenetics
A skyline birth-death process for inferring the population size from a reconstructed tree with occurrences
Phylodynamic models generally aim at jointly inferring phylogenetic relationships, model parameters, and more recently, population size through time for clades of interest, based on molecular sequence data. In the fields of epidemiology and macroevolution these models can be used to estimate, respectively, the past number of infected individuals (prevalence) or the past number of species (paleodiversity) through time. Recent years have seen the development of "total-evidence" analyses, which combine molecular and morphological data from extant and past sampled individuals in a unified Bayesian inference framework. Even sampled individuals characterized only by their sampling time, i.e. lacking morphological and molecular data, which we call occurrences, provide invaluable information to reconstruct past population sizes. Here, we present new methodological developments around the Fossilized Birth-Death Process enabling us to (i) efficiently incorporate occurrence data while remaining computationally tractable and scalable; (ii) consider piecewise-constant birth, death and sampling rates; and (iii) reconstruct past population sizes, with or without knowledge of the underlying tree. We implement our method in the RevBayes software environment, enabling its use along with a large set of models of molecular and morphological evolution, and validate the inference workflow using simulations under a wide range of conditions. We finally illustrate our new implementation using two empirical datasets stemming from the fields of epidemiology and macroevolution. In epidemiology, we apply our model to the Covid-19 outbreak on the Diamond Princess ship. We infer the total prevalence throughout the outbreak, by taking into account jointly the case count record (occurrences) along with viral sequences for a fraction of infected individuals. In macroevolution, we present an empirical case study of cetaceans. We infer the diversity trajectory using molecular and morphological data from extant taxa, morphological data from fossils, as well as numerous fossil occurrences. Our case studies highlight that the advances we present allow us to further bridge the gap between between epidemiology and pathogen genomics, as well as paleontology and molecular phylogenetics