723 research outputs found

    Properties of neutrality tests based on allele frequency spectrum

    Full text link
    One of the main necessities for population geneticists is the availability of statistical tools that enable to accept or reject the neutral Wright-Fisher model with high power. A number of statistical tests have been developed to detect specific deviations from the null frequency spectrum in different directions (i.e., Tajima's D, Fu and Li's F and D test, Fay and Wu's H). Recently, a general framework was proposed to generate all neutrality tests that are linear functions of the frequency spectrum. In this framework, a family of optimal tests was developed to have almost maximum power against a specific alternative evolutionary scenario. Following these developments, in this paper we provide a thorough discussion of linear and nonlinear neutrality tests. First, we present the general framework for linear tests and emphasize the importance of the property of scalability with the sample size (that is, the results of the tests should not depend on the sample size), which, if missing, can guide to errors in data interpretation. The motivation and structure of linear optimal tests are discussed. In a further generalization, we develop a general framework for nonlinear neutrality tests and we derive nonlinear optimal tests for polynomials of any degree in the frequency spectrum.Comment: 42 pages, 3 figures, elsarticl

    Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

    Full text link
    We investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics - for instance estimators of θ\theta or neutrality tests such as Tajima's DD - can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima's DD and Fay and Wu's HH depend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu's HH and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.Comment: 23 pages, 8 figure

    Mlcoalsim: Multilocus Coalescent Simulations

    Get PDF
    Coalescent theory is a powerful tool for population geneticists as well as molecular biologists interested in understanding the patterns and levels of DNA variation. Using coalescent Monte Carlo simulations it is possible to obtain the empirical distributions for a number of statistics across a wide range of evolutionary models; these distributions can be used to test evolutionary hypotheses using experimental data. The mlcoalsim application presented here (based on a version of the ms program, Hudson, 2002) adds important new features to improve methodology (uncertainty and conditional methods for mutation and recombination), models (including strong positive selection, finite sites and heterogeneity in mutation and recombination rates) and analyses (calculating a number of statistics used in population genetics and P-values for observed data). One of the most important features of mlcoalsim is the analysis of multilocus data in linked and independent regions. In summary, mlcoalsim is an integrated software application aimed at researchers interested in molecular evolution. mlcoalsim is written in ANSI C and is available at: http://www.ub.es/softevol/mlcoalsim

    The expected neutral frequency spectrum of linked sites

    Full text link
    We present an exact, closed expression for the expected neutral Site Frequency Spectrum for two neutral sites, 2-SFS, without recombination. This spectrum is the immediate extension of the well known single site θ/f\theta/f neutral SFS. Similar formulae are also provided for the case of the expected SFS of sites that are linked to a focal neutral mutation of known frequency. Formulae for finite samples are obtained by coalescent methods and remarkably simple expressions are derived for the SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. Besides the general interest of these new spectra, they relate to interesting biological cases such as structural variants and introgressions. As an example, we present the expected neutral frequency spectrum of regions with a chromosomal inversion.Comment: 26 pages, 5 figure

    A generalized Watterson estimator for next-generation sequencing : from trios to autopolyploids

    Get PDF
    Several variations of the Watterson estimator of variability for Next Generation Sequencing (NGS) data have been proposed in the literature. We present a unified framework for generalized Watterson estimators based on Maximum Composite Likelihood, which encompasses most of the existing estimators. We propose this class of unbiased estimators as generalized Watterson estimators for a large class of NGS data, including pools and trios. We also discuss the relation with the estimators proposed in the literature and show that they admit two equivalent but seemingly different forms, deriving a set of combinatorial identities as a byproduct. Finally, we give a detailed treatment of Watterson estimators for single or multiple autopolyploid individuals

    Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA

    Get PDF
    Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ~2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. Conclusions/Significance These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection

    The Site Frequency/Dosage Spectrum of Autopolyploid Populations

    Get PDF
    The Site Frequency Spectrum (SFS) and the heterozygosity of allelic variants are among the most important summary statistics for population genetic analysis of diploid organisms. We discuss the generalization of these statistics to populations of autopolyploid organisms in terms of the joint Site Frequency/Dosage Spectrum and its expected value for autopolyploid populations that follow the standard neutral model. Based on these results, we present estimators of nucleotide variability from High-Throughput Sequencing (HTS) data of autopolyploids and discuss potential issues related to sequencing errors and variant calling. We use these estimators to generalize Tajima's D and other SFS-based neutrality tests to HTS data from autopolyploid organisms. Finally, we discuss how these approaches fail when the number of individuals is small. In fact, in autopolyploids there are many possible deviations from the Hardy–Weinberg equilibrium, each reflected in a different shape of the individual dosage distribution. The SFS from small samples is often dominated by the shape of these deviations of the dosage distribution from its Hardy–Weinberg expectations

    Transposable element polymorphisms improve prediction of complex agronomic traits in rice

    Get PDF
    Acord transformatiu CRUE-CSICKey message: Transposon insertion polymorphisms can improve prediction of complex agronomic traits in rice compared to using SNPs only, especially when accessions to be predicted are less related to the training set. Abstract: Transposon insertion polymorphisms (TIPs) are significant sources of genetic variation. Previous work has shown that TIPs can improve detection of causative loci on agronomic traits in rice. Here, we quantify the fraction of variance explained by single nucleotide polymorphisms (SNPs) compared to TIPs, and we explore whether TIPs can improve prediction of traits when compared to using only SNPs. We used eleven traits of agronomic relevance from by five different rice population groups (Aus, Indica, Aromatic, Japonica, and Admixed), 738 accessions in total. We assess prediction by applying data split validation in two scenarios. In the within-population scenario, we predicted performance of improved Indica varieties using the rest of Indica accessions. In the across population scenario, we predicted all Aromatic and Admixed accessions using the rest of populations. In each scenario, Bayes C and a Bayesian reproducible kernel Hilbert space regression were compared. We find that TIPs can explain an important fraction of total genetic variance and that they also improve genomic prediction. In the across population prediction scenario, TIPs outperformed SNPs in nine out of the eleven traits analyzed. In some traits like leaf senescence or grain width, using TIPs increased predictive correlation by 30-50%. Our results evidence, for the first time, that TIPs genotyping can improve prediction on complex agronomic traits in rice, especially when accessions to be predicted are less related to training accessions
    corecore