82 research outputs found

    Fast and Stable Multiple Smoothing Parameter Selection in Smoothing Spline Analysis of Variance Models With Large Samples

    No full text
    <p>The current parameterization and algorithm used to fit a smoothing spline analysis of variance (SSANOVA) model are computationally expensive, making a generalized additive model (GAM) the preferred method for multivariate smoothing. In this article, we propose an efficient reparameterization of the smoothing parameters in SSANOVA models, and a scalable algorithm for estimating multiple smoothing parameters in SSANOVAs. To validate our approach, we present two simulation studies comparing our reparameterization and algorithm to implementations of SSANOVAs and GAMs that are currently available in R. Our simulation results demonstrate that (a) our scalable SSANOVA algorithm outperforms the currently used SSANOVA algorithm, and (b) SSANOVAs can be a fast and reliable alternative to GAMs. We also provide an example with oceanographic data that demonstrates the practical advantage of our SSANOVA framework. Supplementary materials that are available online can be used to replicate the analyses in this article.</p

    Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding

    Get PDF
    <div><p>Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa <em>et al.,</em> RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.</p> </div

    Applying pRKHS to real life scenarios, Pearson correlation coefficients between estimated breeding value (EBV) and phenotype obtained from five-fold cross-validation (CV) implemented for maize anthesis-silking interval (ASI) and grain yield (GY) for each of the 6 statistical methods.

    No full text
    <p>The optimal number of markers contributing to phenotypic variation and percent of variations explained by the included SPCs were shown for pRKHS methods; results were averaged across five repeated fittings. Optimal cosine value was 0.3 for pRKHS-E across all datasets.</p

    Mean percentage of variation (across the 12 simulation scenarios) explained by the top 18 SPCs with pRHKS, which together explain 70% of the total variation.

    No full text
    <p>Mean percentage of variation (across the 12 simulation scenarios) explained by the top 18 SPCs with pRHKS, which together explain 70% of the total variation.</p

    Applying pRKHS to real life scenarios, Pearson correlation coefficients between estimated breeding value (EBV) and phenotype obtained from ten-fold CV using genotypes and phenotypes of barley lines in year 2007 and prediction based on genotypes of different lines in year 2008 and 2009 implemented for grain yield (GYD) and plant height (PHT) for each of the 6 statistical methods.

    No full text
    <p>The optimal number of markers contributing to phenotypic variation and percent of variations explained by the included SPCs were shown for pRKHS methods; results were averaged across five repeated fittings. Optimal cosine value was 0.3 for pRKHS-E across all datasets.</p
    • …
    corecore