1,647 research outputs found

    mbs: modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The pattern of single nucleotide polymorphisms, or SNPs, contains a tremendous amount of information with respect to the mechanisms of the micro-evolutionary process of a species. The inference of the roles of these mechanisms, including natural selection, relies heavily on computer simulations. A coalescent simulation is extremely powerful in generating a large number of samples of DNA sequences from a population (species) when all mutations are neutral, and Hudson's <b>ms </b>software is frequently used for this purpose.</p> <p>However, it has been difficult to incorporate natural selection into the coalescent framework.</p> <p>Results</p> <p>We herein present a software application to generate samples of DNA sequences when there is a biallelic site targeted by selection. This software application, referred to as <b>mbs</b>, is developed by modifying Hudson's <b>ms</b>. The <b>mbs </b>software is so flexible that it can incorporate any arbitrary histories of population size changes and any mode of selection as long as selection is operating on a biallelic site.</p> <p>Conclusion</p> <p><b>mbs </b>provides opportunities to investigate the effect of any mode of selection on the pattern of SNPs under various demography.</p

    How genealogies are affected by the speed of evolution

    Full text link
    In a series of recent works it has been shown that a class of simple models of evolving populations under selection leads to genealogical trees whose statistics are given by the Bolthausen-Sznitman coalescent rather than by the well known Kingman coalescent in the case of neutral evolution. Here we show that when conditioning the genealogies on the speed of evolution, one finds a one parameter family of tree statistics which interpolates between the Bolthausen-Sznitman and Kingman's coalescents. This interpolation can be calculated explicitly for one specific version of the model, the exponential model. Numerical simulations of another version of the model and a phenomenological theory indicate that this one-parameter family of tree statistics could be universal. We compare this tree structure with those appearing in other contexts, in particular in the mean field theory of spin glasses

    A complex pattern of post‐divergence expansion, contraction, introgression and asynchronous responses to Pleistocene climate changes in two Dipelta sister species from western China

    Get PDF
    The well-known vicariance and dispersal models dominate in understanding the allopatric pattern for related species and presume the simultaneous occurrence of speciation and biogeographic events. However, the formation of allopatry may postdate the species divergence. We examined this hypothesis using DNA sequence data from 3 chloroplast fragments and 5 nuclear loci of Dipelta floribunda and D. yunnanensis, two shrub species with the circum Sichuan Basin distribution, combining the climatic niche modeling approach. The best-fit model supported by the approximate Bayesian computation (ABC) analysis indicated that, D. floribunda and D. yunnanensis diverged during the mid-Pleistocene period, consistent with the largest glacial period in the Qinghai-Tibet Plateau (QTP). The historically inter-specific gene flow was identified but seemed to have ceased after the last interglacial period (LIG), when the range of D. floribunda moved northward from the south of the Sichuan Basin. Further, populations of D. floribunda had expanded obviously in the north of the Sichuan Basin after the last glacial maximum (LGM). Relatively, the range of D. yunnanensis expanded before the LGM, reduced during the post-LGM especially in the north of the Sichuan Basin, reflecting the asynchronous responses of related species to the contemporary climate changes. Our results suggested that complex topography should be considered in understanding the distributional patterns even for closely related species and their demographic responses

    Detecting recent selective sweeps while controlling for mutation rate and background selection

    Get PDF
    A composite likelihood ratio test implemented in the program sweepfinder is a commonly used method for scanning a genome for recent selective sweeps. sweepfinder uses information on the spatial pattern (along the chromosome) of the site frequency spectrum around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson–Kreitman–Aguadé test, we suggest adding fixed differences relative to an out‐group to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection, modelled as a local reduction in the effective population size. Using simulations, we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.Christian D. Huber, Michael DeGiorgio, Ines Hellmann, Rasmus Nielse

    Chloroplast microsatellites: measures of genetic diversity and the effect of homoplasy

    Full text link
    Chloroplast microsatellites have been widely used in population genetic studies of conifers in recent years. However, their haplotype configurations suggest that they could have high levels of homoplasy, thus limiting the power of these molecular markers. A coalescent-based computer simulation was used to explore the influence of homoplasy on measures of genetic diversity based on chloroplast microsatellites. The conditions of the simulation were defined to fit isolated populations originating from the colonization of one single haplotype into an area left available after a glacial retreat. Simulated data were compared with empirical data available from the literature for a species of Pinus that has expanded north after the Last Glacial Maximum. In the evaluation of genetic diversity, homoplasy was found to have little influence on Nei's unbiased haplotype diversity (H(E)) while Goldstein's genetic distance estimates (D2sh) were much more affected. The effect of the number of chloroplast microsatellite loci for evaluation of genetic diversity is also discussed

    A minimal descriptor of an ancestral recombinations graph

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.</p> <p>Results</p> <p>In this paper, we identify structure-preserving and samples-preserving core of an ARG <it>G</it> and call it the minimal descriptor ARG of <it>G</it>. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of <it>G</it> and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of <it>G</it>. We also prove that even an unbounded <it>G</it> has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of <it>G</it> and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.</p> <p>Conclusions</p> <p>Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).</p

    High genetic diversity at the extreme range edge: nucleotide variation at nuclear loci in Scots pine (Pinus sylvestris L.) in Scotland

    Get PDF
    Nucleotide polymorphism at 12 nuclear loci was studied in Scots pine populations across an environmental gradient in Scotland, to evaluate the impacts of demographic history and selection on genetic diversity. At eight loci, diversity patterns were compared between Scottish and continental European populations. At these loci, a similar level of diversity (θsil=~0.01) was found in Scottish vs mainland European populations, contrary to expectations for recent colonization, however, less rapid decay of linkage disequilibrium was observed in the former (ρ=0.0086±0.0009, ρ=0.0245±0.0022, respectively). Scottish populations also showed a deficit of rare nucleotide variants (multi-locus Tajima's D=0.316 vs D=−0.379) and differed significantly from mainland populations in allelic frequency and/or haplotype structure at several loci. Within Scotland, western populations showed slightly reduced nucleotide diversity (πtot=0.0068) compared with those from the south and east (0.0079 and 0.0083, respectively) and about three times higher recombination to diversity ratio (ρ/θ=0.71 vs 0.15 and 0.18, respectively). By comparison with results from coalescent simulations, the observed allelic frequency spectrum in the western populations was compatible with a relatively recent bottleneck (0.00175 × 4Ne generations) that reduced the population to about 2% of the present size. However, heterogeneity in the allelic frequency distribution among geographical regions in Scotland suggests that subsequent admixture of populations with different demographic histories may also have played a role

    Complex Interplay of Evolutionary Forces in the ladybird Homeobox Genes of Drosophila melanogaster

    Get PDF
    Tandemly arranged paralogous genes lbe and lbl are members of the Drosophila NK homeobox family. We analyzed population samples of Drosophila melanogaster from Africa, Europe, North and South America, and single strains of D. sechellia, D. simulans, and D. yakuba within two linked regions encompassing partial sequences of lbe and lbl. The evolution of lbe and lbl is highly constrained due to their important regulatory functions. Despite this, a variety of forces have shaped the patterns of variation in lb genes: recombination, intragenic gene conversion and natural selection strongly influence background variation created by linkage disequilibrium and dimorphic haplotype structure. The two genes exhibited similar levels of nucleotide diversity and positive selection was detected in the noncoding regions of both genes. However, synonymous variability was significantly higher for lbe: no nonsynonymous changes were observed in this gene. We argue that balancing selection impacts some synonymous sites of the lbe gene. Stability of mRNA secondary structure was significantly different between the lbe (but not lbl) haplotype groups and may represent a driving force of balancing selection in epistatically interacting synonymous sites. Balancing selection on synonymous sites may be the first, or one of a few such observations, in Drosophila. In contrast, recurrent positive selection on lbl at the protein level influenced evolution at three codon sites. Transcription factor binding-site profiles were different for lbe and lbl, suggesting that their developmental functions are not redundant. Combined with our previous results on nucleotide variation in esterase and other homeobox genes, these results suggest that interplay of balancing and directional selection may be a general feature of molecular evolution in Drosophila and other eukaryote genomes

    The complete linkage disequilibrium test: a test that points to causative mutations underlying quantitative traits

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetically, SNP that are in complete linkage disequilibrium with the causative SNP cannot be distinguished from the causative SNP. The Complete Linkage Disequilibrium (CLD) test presented here tests whether a SNP is in complete LD with the causative mutation or not. The performance of the CLD test is evaluated in 1000 simulated datasets.</p> <p>Methods</p> <p>The CLD test consists of two steps i.e. analysis I and analysis II. Analysis I consists of an association analysis of the investigated region. The log-likelihood values from analysis I are next ranked in descending order and in analysis II the CLD test evaluates differences in log-likelihood ratios between the best and second best markers. Under the null-hypothesis distribution, the best SNP is in greater LD with the QTL than the second best, while under the alternative-CLD-hypothesis, the best SNP is alike-in-state with the QTL. To find a significance threshold, the test was also performed on data excluding the causative SNP. The 5<sup>th</sup>, 10<sup>th </sup>and 50<sup>th </sup>highest T<sub>CLD </sub>value from 1000 replicated analyses were used to control the type-I-error rate of the test at p = 0.005, p = 0.01 and p = 0.05, respectively.</p> <p>Results</p> <p>In a situation where the QTL explained 48% of the phenotypic variance analysis I detected a QTL in 994 replicates (p = 0.001), where 972 were positioned in the correct QTL position. When the causative SNP was excluded from the analysis, 714 replicates detected evidence of a QTL (p = 0.001). In analysis II, the CLD test confirmed 280 causative SNP from 1000 simulations (p = 0.05), i.e. power was 28%. When the effect of the QTL was reduced by doubling the error variance, the power of the test reduced relatively little to 23%. When sequence data were used, the power of the test reduced to 16%. All SNP that were confirmed by the CLD test were positioned in the correct QTL position.</p> <p>Conclusions</p> <p>The CLD test can provide evidence for a causative SNP, but its power may be low in situations with closely linked markers. In such situations, also functional evidence will be needed to definitely conclude whether the SNP is causative or not.</p

    Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants

    Get PDF
    Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants
    corecore