9,518 research outputs found
An alternative marginal likelihood estimator for phylogenetic models
Bayesian phylogenetic methods are generating noticeable enthusiasm in the
field of molecular systematics. Many phylogenetic models are often at stake and
different approaches are used to compare them within a Bayesian framework. The
Bayes factor, defined as the ratio of the marginal likelihoods of two competing
models, plays a key role in Bayesian model selection. We focus on an
alternative estimator of the marginal likelihood whose computation is still a
challenging problem. Several computational solutions have been proposed none of
which can be considered outperforming the others simultaneously in terms of
simplicity of implementation, computational burden and precision of the
estimates. Practitioners and researchers, often led by available software, have
privileged so far the simplicity of the harmonic mean estimator (HM) and the
arithmetic mean estimator (AM). However it is known that the resulting
estimates of the Bayesian evidence in favor of one model are biased and often
inaccurate up to having an infinite variance so that the reliability of the
corresponding conclusions is doubtful. Our new implementation of the
generalized harmonic mean (GHM) idea recycles MCMC simulations from the
posterior, shares the computational simplicity of the original HM estimator,
but, unlike it, overcomes the infinite variance issue. The alternative
estimator is applied to simulated phylogenetic data and produces fully
satisfactory results outperforming those simple estimators currently provided
by most of the publicly available software
A genomic map of the effects of linked selection in Drosophila
Natural selection at one site shapes patterns of genetic variation at linked
sites. Quantifying the effects of 'linked selection' on levels of genetic
diversity is key to making reliable inference about demography, building a null
model in scans for targets of adaptation, and learning about the dynamics of
natural selection. Here, we introduce the first method that jointly infers
parameters of distinct modes of linked selection, notably background selection
and selective sweeps, from genome-wide diversity data, functional annotations
and genetic maps. The central idea is to calculate the probability that a
neutral site is polymorphic given local annotations, substitution patterns, and
recombination rates. Information is then combined across sites and samples
using composite likelihood in order to estimate genome-wide parameters of
distinct modes of selection. In addition to parameter estimation, this approach
yields a map of the expected neutral diversity levels along the genome. To
illustrate the utility of our approach, we apply it to genome-wide resequencing
data from 125 lines in Drosophila melanogaster and reliably predict diversity
levels at the 1Mb scale. Our results corroborate estimates of a high fraction
of beneficial substitutions in proteins and untranslated regions (UTR). They
allow us to distinguish between the contribution of sweeps and other modes of
selection around amino acid substitutions and to uncover evidence for pervasive
sweeps in untranslated regions (UTRs). Our inference further suggests a
substantial effect of linked selection from non-classic sweeps. More generally,
we demonstrate that linked selection has had a larger effect in reducing
diversity levels and increasing their variance in D. melanogaster than
previously appreciated
Combining genomics and epidemiology to track mumps virus transmission in the United States.
Unusually large outbreaks of mumps across the United States in 2016 and 2017 raised questions about the extent of mumps circulation and the relationship between these and prior outbreaks. We paired epidemiological data from public health investigations with analysis of mumps virus whole genome sequences from 201 infected individuals, focusing on Massachusetts university communities. Our analysis suggests continuous, undetected circulation of mumps locally and nationally, including multiple independent introductions into Massachusetts and into individual communities. Despite the presence of these multiple mumps virus lineages, the genomic data show that one lineage has dominated in the US since at least 2006. Widespread transmission was surprising given high vaccination rates, but we found no genetic evidence that variants arising during this outbreak contributed to vaccine escape. Viral genomic data allowed us to reconstruct mumps transmission links not evident from epidemiological data or standard single-gene surveillance efforts and also revealed connections between apparently unrelated mumps outbreaks
Persistent anthrax as a major driver of wildlife mortality in a tropical rainforest
Anthrax is a globally important animal disease and zoonosis. Despite this, our current knowledge of anthrax ecology is largely limited to arid ecosystems, where outbreaks are most commonly reported. Here we show that the dynamics of an anthrax-causing agent, Bacillus cereus biovar anthracis, in a tropical rainforest have severe consequences for local wildlife communities. Using data and samples collected over three decades, we show that rainforest anthrax is a persistent and widespread cause of death for a broad range of mammalian hosts. We predict that this pathogen will accelerate the decline and possibly result in the extirpation of local chimpanzee (Pan troglodytes verus) populations. We present the epidemiology of a cryptic pathogen and show that its presence has important implications for conservation
Wave-like spread of Ebola Zaire
In the past decade the Zaire strain of Ebola virus (ZEBOV) has emerged repeatedly into human populations in central Africa and caused massive die-offs of gorillas and chimpanzees. We tested the view that emergence events are independent and caused by ZEBOV variants that have been long resident at each locality. Phylogenetic analyses place the earliest known outbreak at Yambuku, Democratic Republic of Congo, very near to the root of the ZEBOV tree, suggesting that viruses causing all other known outbreaks evolved from a Yambuku-like virus after 1976. The tendency for earlier outbreaks to be directly ancestral to later outbreaks suggests that outbreaks are epidemiologically linked and may have occurred at the front of an advancing wave. While the ladder-like phylogenetic structure could also bear the signature of positive selection, our statistical power is too weak to reach a conclusion in this regard. Distances among outbreaks indicate a spread rate of about 50 km per year that remains consistent across spatial scales. Viral evolution is clocklike, and sequences show a high level of small-scale spatial structure. Genetic similarity decays with distance at roughly the same rate at all spatial scales. Our analyses suggest that ZEBOV has recently spread across the region rather than being long persistent at each outbreak locality. Controlling the impact of Ebola on wild apes and human populations may be more feasible than previously recognized
Mutation supply and the repeatability of selection for antibiotic resistance
Whether evolution can be predicted is a key question in evolutionary biology.
Here we set out to better understand the repeatability of evolution. We
explored experimentally the effect of mutation supply and the strength of
selective pressure on the repeatability of selection from standing genetic
variation. Different sizes of mutant libraries of an antibiotic resistance
gene, TEM-1 -lactamase in Escherichia coli, were subjected to different
antibiotic concentrations. We determined whether populations went extinct or
survived, and sequenced the TEM gene of the surviving populations. The
distribution of mutations per allele in our mutant libraries- generated by
error-prone PCR- followed a Poisson distribution. Extinction patterns could be
explained by a simple stochastic model that assumed the sampling of beneficial
mutations was key for survival. In most surviving populations, alleles
containing at least one known large-effect beneficial mutation were present.
These genotype data also support a model which only invokes sampling effects to
describe the occurrence of alleles containing large-effect driver mutations.
Hence, evolution is largely predictable given cursory knowledge of mutational
fitness effects, the mutation rate and population size. There were no clear
trends in the repeatability of selected mutants when we considered all
mutations present. However, when only known large-effect mutations were
considered, the outcome of selection is less repeatable for large libraries, in
contrast to expectations. Furthermore, we show experimentally that alleles
carrying multiple mutations selected from large libraries confer higher
resistance levels relative to alleles with only a known large-effect mutation,
suggesting that the scarcity of high-resistance alleles carrying multiple
mutations may contribute to the decrease in repeatability at large library
sizes.Comment: 31pages, 9 figure
Survival analysis of DNA mutation motifs with penalized proportional hazards
Antibodies, an essential part of our immune system, develop through an
intricate process to bind a wide array of pathogens. This process involves
randomly mutating DNA sequences encoding these antibodies to find variants with
improved binding, though mutations are not distributed uniformly across
sequence sites. Immunologists observe this nonuniformity to be consistent with
"mutation motifs", which are short DNA subsequences that affect how likely a
given site is to experience a mutation. Quantifying the effect of motifs on
mutation rates is challenging: a large number of possible motifs makes this
statistical problem high dimensional, while the unobserved history of the
mutation process leads to a nontrivial missing data problem. We introduce an
-penalized proportional hazards model to infer mutation motifs and
their effects. In order to estimate model parameters, our method uses a Monte
Carlo EM algorithm to marginalize over the unknown ordering of mutations. We
show that our method performs better on simulated data compared to current
methods and leads to more parsimonious models. The application of proportional
hazards to mutation processes is, to our knowledge, novel and formalizes the
current methods in a statistical framework that can be easily extended to
analyze the effect of other biological features on mutation rates
- …