20 research outputs found
Putatively neutral regions
This zipped directory includes putatively neutral regions for the cutoff values of genetic distance to the nearest gene used in the article (0.0 cM, 0.2 cM, 0.4 cM, 0.6 cM, 0.8 cM, and 1 cM)
Filtered VCF files
This file contains the variants that were genotyped using GATK3. A GATK hard-filter has been applied
Recommended from our members
Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories.
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data
Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data
Comparison of single genome and allele frequency data reveals discordant demographic histories
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data
Data from: Complex patterns of sex-biased demography in canines
The demographic history of dogs is complex, involving multiple bottlenecks, admixture events and artificial selection. However, existing genetic studies have not explored variance in the number of reproducing males and females, and whether it has changed across evolutionary time. While male-biased mating practices, such as male-biased migration and multiple paternity, have been observed in wolves, recent breeding practices could have led to female-biased mating patterns in breed dogs. For example, breed dogs are thought to have experienced a popular sire effect, where a small number of males father many offspring with a large number of females. Here we use genetic variation data to test how widespread sex-biased mating practices in canines are during different evolutionary time points. Using whole-genome sequence data from 33 dogs and wolves, we show that patterns of diversity on the X chromosome and autosomes are consistent with a higher number of reproducing males than females over ancient evolutionary history in both dogs and wolves, suggesting that mating practices did not change during early dog domestication. By contrast, since breed formation, we found evidence for a larger number of reproducing females than males in breed dogs, consistent with the popular sire effect. Our results confirm that canine demography has been complex, with opposing sex-biased processes occurring throughout their history. The signatures observed in genetic data are consistent with documented sex-biased mating practices in both the wild and domesticated populations, suggesting that these mating practices are pervasive
Relationship between divergence and functional content, human recombination, and McVicker’s <i>B</i>-values as a function of GERP score cutoff.
<p>(A) Human-primate divergence versus functional content. (B) Human-primate divergence versus human recombination rate. (C) Human-rodent divergence versus functional content. (D) Human-rodent divergence versus McVicker’s <i>B</i>-values.</p
A two-locus model for the effect of background selection on divergence.
<p>(A) The variance in divergence between two loci explained by background selection (BGS) as a function of the strength of background selection at the second locus (<i>B</i><sub><i>2</i></sub>). (B) The expected proportion of divergence due to polymorphism in the ancestral population as a function of <i>B</i><sub><i>2</i></sub>. (C) The variance in divergence between the two loci explained by polymorphism in the ancestral population as a function of <i>B</i><sub><i>2</i></sub>. Different columns denote different mutation rates. Colored lines denote different ancestral population sizes (<i>N</i><sub><i>a</i></sub>). Note that the variance in divergence attributable to background selection is greater than the expected proportion of divergence contributed by ancestral polymorphism.</p
Human-primate divergence is reduced at putatively neutral sites near selected sites.
<p>(A) Neutral human-chimp divergence is negatively correlated with functional content. (B) Neutral human-orang divergence is negatively correlated with functional content. (C) Neutral human-chimp divergence is positively correlated with human recombination rate. (D) Neutral human-orang divergence is positively correlated with human recombination rate. Each point represents the mean divergence and functional content (A and B) or recombination rate (C and D) in 1% of the 100kb windows binned by functional content or recombination rate. Red lines indicate the loess curves fit to divergence and functional content (A and B) and divergence and recombination rate (C and D). The high variance of divergence at regions of low recombination rate is expected since the variance of divergence is inversely proportional to the recombination rate. Note that the last bin containing less than 1% of the windows was omitted from the plot. While the graph presents binned data, the correlations reported in the text are from the unbinned data.</p
Models incorporating background selection can generate patterns of neutral divergence that recapitulate the empirical correlations.
<p>(A) Models of background selection predict a positive correlation between neutral human-chimp divergence and human recombination. Because our model does not include biased gene conversion, the empirical correlation was calculated omitting AT to GC sequence differences. (B) Models of background selection predict a positive correlation between neutral human-mouse divergence and McVicker’s <i>B</i>-values. White histogram denotes 500 simulations not including background selection. Gray histogram denotes 500 simulations incorporating background selection (see text). Red line represents the correlation computed from empirical data. Thus, plausible levels of background selection can match the observed correlations while neutral simulations cannot.</p