13 research outputs found
A genomic map of the effects of linked selection in Drosophila
Natural selection at one site shapes patterns of genetic variation at linked
sites. Quantifying the effects of 'linked selection' on levels of genetic
diversity is key to making reliable inference about demography, building a null
model in scans for targets of adaptation, and learning about the dynamics of
natural selection. Here, we introduce the first method that jointly infers
parameters of distinct modes of linked selection, notably background selection
and selective sweeps, from genome-wide diversity data, functional annotations
and genetic maps. The central idea is to calculate the probability that a
neutral site is polymorphic given local annotations, substitution patterns, and
recombination rates. Information is then combined across sites and samples
using composite likelihood in order to estimate genome-wide parameters of
distinct modes of selection. In addition to parameter estimation, this approach
yields a map of the expected neutral diversity levels along the genome. To
illustrate the utility of our approach, we apply it to genome-wide resequencing
data from 125 lines in Drosophila melanogaster and reliably predict diversity
levels at the 1Mb scale. Our results corroborate estimates of a high fraction
of beneficial substitutions in proteins and untranslated regions (UTR). They
allow us to distinguish between the contribution of sweeps and other modes of
selection around amino acid substitutions and to uncover evidence for pervasive
sweeps in untranslated regions (UTRs). Our inference further suggests a
substantial effect of linked selection from non-classic sweeps. More generally,
we demonstrate that linked selection has had a larger effect in reducing
diversity levels and increasing their variance in D. melanogaster than
previously appreciated
Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans
In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence—in particular, conclusions about the rate and strength of beneficial substitutions—remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ∼13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions in Drosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation
Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements
Analyses of genetic variation in many taxa have established that neutral genetic diversity is shaped by natural selection at linked sites. Whether the mode of selection is primarily the fixation of strongly beneficial alleles (selective sweeps) or purifying selection on deleterious mutations (background selection) remains unknown, however. We address this question in humans by fitting a model of the joint effects of selective sweeps and background selection to autosomal polymorphism data from the 1000 Genomes Project. After controlling for variation in mutation rates along the genome, a model of background selection alone explains ~60% of the variance in diversity levels at the megabase scale. Adding the effects of selective sweeps driven by adaptive substitutions to the model does not improve the fit, and when both modes of selection are considered jointly, selective sweeps are estimated to have had little or no effect on linked neutral diversity. The regions under purifying selection are best predicted by phylogenetic conservation, with ~80% of the deleterious mutations affecting neutral diversity occurring in non-exonic regions. Thus, background selection is the dominant mode of linked selection in humans, with marked effects on diversity levels throughout autosomes
Observed and predicted scaled diversity levels around amino acid substitutions.
<p>(<b>A</b>) Comparison of scaled diversity levels around non-synonymous (NS) and synonymous (SYN) substitutions. (<b>B</b>) Comparison of predicted, scaled diversity levels based on our method and that of Sattath et al. (2011) [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006130#pgen.1006130.ref048" target="_blank">48</a>].</p
The contribution of background selection and classic sweeps to scaled diversity levels around non-synonymous and synonymous substitutions.
<p>(<b>A</b>) Observed and predicted scaled diversity levels around non-synonymous (left) and synonymous (right) substitutions. The predictions are based on the joint model for background selection and classic sweeps. (<b>B</b>) The contribution of background selection (blue) and classic sweeps (red) measured in terms of the coalescent rates that they induce. The rates are measured in units of 1/2<i>N</i><sub><i>e</i></sub>, where <i>N</i><sub><i>e</i></sub> is our estimate of the effective population size in the absence of linked selection. To make these graphs comparable to the scaled diversity levels in (A), with lower rates corresponding to higher scaled diversity levels, the direction of the y-axis is reversed. (<b>C</b>) The density of exonic sites (blue) and non-synonymous substitutions (red) as a function of distance from non-synonymous and synonymous substitutions. Densities are normalized by the average densities at distance >0.06cM; the shaded areas correspond to the use of a different linear scale.</p
A comparison of observed and predicted scaled diversity levels along the major autosomes of <i>Drosophila melangaster</i>.
<p>Throughout, we refer to “scaled diversity” as synonymous heterozygosity divided by synonymous divergence, to control for variation in the mutation rate (as detailed in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006130#pgen.1006130.s001" target="_blank">S1C Text</a>); scaled diversity is shown relative to the genome average. (<b>A</b>) Observed and predicted scaled diversity over non-overlapping 1 Mb windows across chromosomal arms. (<b>B</b>) Summaries of the goodness of fit for models including background selection (BS), classic sweeps (CS) and both (BS & CS). <i>R</i><sup>2</sup> is calculated for autosomes using non-overlapping windows of different sizes. Selection parameters are inferred using synonymous sites with recombination rate >0.75cM/Mb, while the predictions and corresponding summaries are calculated for sites with recombination rate >0.1cM/Mb.</p