19 research outputs found

    Transposable element polymorphisms improve prediction of complex agronomic traits in rice

    Get PDF
    Acord transformatiu CRUE-CSICKey message: Transposon insertion polymorphisms can improve prediction of complex agronomic traits in rice compared to using SNPs only, especially when accessions to be predicted are less related to the training set. Abstract: Transposon insertion polymorphisms (TIPs) are significant sources of genetic variation. Previous work has shown that TIPs can improve detection of causative loci on agronomic traits in rice. Here, we quantify the fraction of variance explained by single nucleotide polymorphisms (SNPs) compared to TIPs, and we explore whether TIPs can improve prediction of traits when compared to using only SNPs. We used eleven traits of agronomic relevance from by five different rice population groups (Aus, Indica, Aromatic, Japonica, and Admixed), 738 accessions in total. We assess prediction by applying data split validation in two scenarios. In the within-population scenario, we predicted performance of improved Indica varieties using the rest of Indica accessions. In the across population scenario, we predicted all Aromatic and Admixed accessions using the rest of populations. In each scenario, Bayes C and a Bayesian reproducible kernel Hilbert space regression were compared. We find that TIPs can explain an important fraction of total genetic variance and that they also improve genomic prediction. In the across population prediction scenario, TIPs outperformed SNPs in nine out of the eleven traits analyzed. In some traits like leaf senescence or grain width, using TIPs increased predictive correlation by 30-50%. Our results evidence, for the first time, that TIPs genotyping can improve prediction on complex agronomic traits in rice, especially when accessions to be predicted are less related to training accessions

    A General Framework for Neutrality Tests Based on the Site Frequency Spectrum

    No full text
    One of the main necessities for population geneticists is the availability of sensitive statistical tools that enable to accept or reject the standard Wrightā€“Fisher model of neutral evolution. A number of statistical tests have been developed to detect specific deviations from the null frequency spectrum in different directions (e.g., Tajimaā€™s D, Fu and Liā€™s F and D tests, Fay and Wuā€™s H). A general framework exists to generate all neutrality tests that are linear functions of the frequency spectrum. In this framework, it is possible to develop a family of optimal tests with almost maximum power against a specific alternative evolutionary scenario. In this paper we provide a thorough discussion of the structure and properties of linear and nonlinear neutrality tests. First, we present the general framework for linear tests and emphasise the importance of the property of scalability with the sample size (that is, the interpretation of the tests should not depend on the sample size), which, if missing, can lead to errors in interpreting the data. After summarising the motivation and structure of linear optimal tests, we present a more general framework for the optimisation of linear tests, leading to a new family of tunable neutrality tests. In a further generalisation, we extend the framework to nonlinear neutrality tests and we derive nonlinear optimal tests for polynomials of any degree in the frequency spectrum

    Multilocus analysis of variation and speciation in the closely related species Arabidopsis halleri and A. lyrata.

    No full text
    Nucleotide variation in eight effectively unlinked genes was surveyed in species-wide samples of the closely related outbreeding species Arabidopsis halleri and A. lyrata ssp. petraea and in three of these genes in A. lyrata ssp. lyrata and A. thaliana. Significant genetic differentiation was observed more frequently in A. l. petraea than in A. halleri. Average estimates of nucleotide variation were highest in A. l. petraea and lowest in A. l. lyrata, reflecting differences among species in effective population size. The low level of variation in A. l. lyrata is concordant with a bottleneck effect associated with its origin. The A. halleri/A. l. petraea speciation process was studied, considering the orthologous sequences of an outgroup species (A. thaliana). The high number of ancestral mutations relative to exclusive polymorphisms detected in A. halleri and A. l. petraea, the significant results of the multilocus Fay and Wu H tests, and haplotype sharing between the species indicate introgression subsequent to speciation. Average among-population variation in A. halleri and A. l. petraea was approximately 1.5- and 3-fold higher than that in the inbreeder A. thaliana. The detected reduction of variation in A. thaliana is less than that expected from differences in mating system alone, and therefore from selective processes related to differences in the effective recombination rate, but could be explained by differences in population structure

    A Deep Catalog of Autosomal Single Nucleotide Variation in the Pig

    Get PDF
    <div><p>A comprehensive catalog of variability in a given species is useful for many important purposes, e.g., designing high density arrays or pinpointing potential mutations of economic or physiological interest. Here we provide a genomewide, worldwide catalog of single nucleotide variants by simultaneously analyzing the shotgun sequence of 128 pigs and five suid outgroups. Despite the high SNP missing rate of some individuals (up to 88%), we retrieved over 48 million high quality variants. Of them, we were able to assess the ancestral allele of more than 39M biallelic SNPs. We found SNPs in 21,455 out of the 25,322 annotated genes in pig assembly 10.2. The annotation showed that more than 40% of the variants were novel variants, not present in dbSNP. Surprisingly, we found a large variability in transition / transversion rate along the genome, which is very well explained (R<sup>2</sup>=0.79) primarily by genome differences in in CpG content and recombination rate. The number of SNPs per window also varied but was less dependent of known factors such as gene density, missing rate or recombination (R<sup>2</sup>=0.48). When we divided the samples in four groups, Asian wild boar (ASWB), Asian domestics (ASDM), European wild boar (EUWB) and European domestics (EUDM), we found a marked correlation in allele frequencies between domestics and wild boars within Asia and within Europe, but not across continents, due to the large evolutive distance between pigs of both continents (~1.2 MYA). In general, the porcine species showed a small percentage of SNPs exclusive of each population group. EUWB and EUDM were predicted to harbor a larger fraction of potentially deleterious mutations, according to the SIFT algorithm, than Asian samples, perhaps a result of background selection being less effective due to a lower effective population size in Europe.</p></div

    Number of SNPs per kb (top), average transition / transversion rate (middle) and CpG count per kb (bottom) per window.

    No full text
    <p>On the x axis, each dot represents a window of ~1Mb long. Different colors correspond to different chromosomes, from SSC1 to SSC18.</p

    Number of SNPs vs. missing rate (a), and vs. Ts/Tv ratio (b); fitted vs. observed number of SNPs (c) and Ts/Tv (d) using equations 1 and 2, respectively.

    No full text
    <p>Number of SNPs vs. missing rate (a), and vs. Ts/Tv ratio (b); fitted vs. observed number of SNPs (c) and Ts/Tv (d) using equations <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0118867#pone.0118867.e001" target="_blank">1</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0118867#pone.0118867.e002" target="_blank">2</a>, respectively.</p

    Summary of the SNP annotation results for the most deleterious consequence obtained using VEP.

    No full text
    <p>Summary of the SNP annotation results for the most deleterious consequence obtained using VEP.</p

    Joint site frequency spectra between population groups.

    No full text
    <p>Only SNPs found in the modal number of samples per groups were used. In each figure, x and y axis represent counts of the derived allele from 1 to 2N in each population, where N is the number of samples having the largest number of SNPs genotyped. Note that a count of 2N in say axis x means that the derived allele is fixed in that population but the same SNP can be segregating in the other population. The frequency of bivariate counts is represented in colors, with the log-scale as shown in the vertical bar. The more frequent a class is, the lighter the color, where dark green correspond to rare classes.</p
    corecore