9,518 research outputs found

    An alternative marginal likelihood estimator for phylogenetic models

    Get PDF
    Bayesian phylogenetic methods are generating noticeable enthusiasm in the field of molecular systematics. Many phylogenetic models are often at stake and different approaches are used to compare them within a Bayesian framework. The Bayes factor, defined as the ratio of the marginal likelihoods of two competing models, plays a key role in Bayesian model selection. We focus on an alternative estimator of the marginal likelihood whose computation is still a challenging problem. Several computational solutions have been proposed none of which can be considered outperforming the others simultaneously in terms of simplicity of implementation, computational burden and precision of the estimates. Practitioners and researchers, often led by available software, have privileged so far the simplicity of the harmonic mean estimator (HM) and the arithmetic mean estimator (AM). However it is known that the resulting estimates of the Bayesian evidence in favor of one model are biased and often inaccurate up to having an infinite variance so that the reliability of the corresponding conclusions is doubtful. Our new implementation of the generalized harmonic mean (GHM) idea recycles MCMC simulations from the posterior, shares the computational simplicity of the original HM estimator, but, unlike it, overcomes the infinite variance issue. The alternative estimator is applied to simulated phylogenetic data and produces fully satisfactory results outperforming those simple estimators currently provided by most of the publicly available software

    A genomic map of the effects of linked selection in Drosophila

    Get PDF
    Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of 'linked selection' on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated

    Combining genomics and epidemiology to track mumps virus transmission in the United States.

    Get PDF
    Unusually large outbreaks of mumps across the United States in 2016 and 2017 raised questions about the extent of mumps circulation and the relationship between these and prior outbreaks. We paired epidemiological data from public health investigations with analysis of mumps virus whole genome sequences from 201 infected individuals, focusing on Massachusetts university communities. Our analysis suggests continuous, undetected circulation of mumps locally and nationally, including multiple independent introductions into Massachusetts and into individual communities. Despite the presence of these multiple mumps virus lineages, the genomic data show that one lineage has dominated in the US since at least 2006. Widespread transmission was surprising given high vaccination rates, but we found no genetic evidence that variants arising during this outbreak contributed to vaccine escape. Viral genomic data allowed us to reconstruct mumps transmission links not evident from epidemiological data or standard single-gene surveillance efforts and also revealed connections between apparently unrelated mumps outbreaks

    Persistent anthrax as a major driver of wildlife mortality in a tropical rainforest

    Get PDF
    Anthrax is a globally important animal disease and zoonosis. Despite this, our current knowledge of anthrax ecology is largely limited to arid ecosystems, where outbreaks are most commonly reported. Here we show that the dynamics of an anthrax-causing agent, Bacillus cereus biovar anthracis, in a tropical rainforest have severe consequences for local wildlife communities. Using data and samples collected over three decades, we show that rainforest anthrax is a persistent and widespread cause of death for a broad range of mammalian hosts. We predict that this pathogen will accelerate the decline and possibly result in the extirpation of local chimpanzee (Pan troglodytes verus) populations. We present the epidemiology of a cryptic pathogen and show that its presence has important implications for conservation

    Wave-like spread of Ebola Zaire

    Get PDF
    In the past decade the Zaire strain of Ebola virus (ZEBOV) has emerged repeatedly into human populations in central Africa and caused massive die-offs of gorillas and chimpanzees. We tested the view that emergence events are independent and caused by ZEBOV variants that have been long resident at each locality. Phylogenetic analyses place the earliest known outbreak at Yambuku, Democratic Republic of Congo, very near to the root of the ZEBOV tree, suggesting that viruses causing all other known outbreaks evolved from a Yambuku-like virus after 1976. The tendency for earlier outbreaks to be directly ancestral to later outbreaks suggests that outbreaks are epidemiologically linked and may have occurred at the front of an advancing wave. While the ladder-like phylogenetic structure could also bear the signature of positive selection, our statistical power is too weak to reach a conclusion in this regard. Distances among outbreaks indicate a spread rate of about 50 km per year that remains consistent across spatial scales. Viral evolution is clocklike, and sequences show a high level of small-scale spatial structure. Genetic similarity decays with distance at roughly the same rate at all spatial scales. Our analyses suggest that ZEBOV has recently spread across the region rather than being long persistent at each outbreak locality. Controlling the impact of Ebola on wild apes and human populations may be more feasible than previously recognized

    Mutation supply and the repeatability of selection for antibiotic resistance

    Full text link
    Whether evolution can be predicted is a key question in evolutionary biology. Here we set out to better understand the repeatability of evolution. We explored experimentally the effect of mutation supply and the strength of selective pressure on the repeatability of selection from standing genetic variation. Different sizes of mutant libraries of an antibiotic resistance gene, TEM-1 β\beta-lactamase in Escherichia coli, were subjected to different antibiotic concentrations. We determined whether populations went extinct or survived, and sequenced the TEM gene of the surviving populations. The distribution of mutations per allele in our mutant libraries- generated by error-prone PCR- followed a Poisson distribution. Extinction patterns could be explained by a simple stochastic model that assumed the sampling of beneficial mutations was key for survival. In most surviving populations, alleles containing at least one known large-effect beneficial mutation were present. These genotype data also support a model which only invokes sampling effects to describe the occurrence of alleles containing large-effect driver mutations. Hence, evolution is largely predictable given cursory knowledge of mutational fitness effects, the mutation rate and population size. There were no clear trends in the repeatability of selected mutants when we considered all mutations present. However, when only known large-effect mutations were considered, the outcome of selection is less repeatable for large libraries, in contrast to expectations. Furthermore, we show experimentally that alleles carrying multiple mutations selected from large libraries confer higher resistance levels relative to alleles with only a known large-effect mutation, suggesting that the scarcity of high-resistance alleles carrying multiple mutations may contribute to the decrease in repeatability at large library sizes.Comment: 31pages, 9 figure

    Survival analysis of DNA mutation motifs with penalized proportional hazards

    Full text link
    Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an â„“1\ell_1-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates
    • …
    corecore