Search CORE

82 research outputs found

Fast and Stable Multiple Smoothing Parameter Selection in Smoothing Spline Analysis of Variance Models With Large Samples

Author: Nathaniel E. Helwig (577674)
Ping Ma (15010)
Publication venue
Publication date: 08/10/2015
Field of study

The current parameterization and algorithm used to fit a smoothing spline analysis of variance (SSANOVA) model are computationally expensive, making a generalized additive model (GAM) the preferred method for multivariate smoothing. In this article, we propose an efficient reparameterization of the smoothing parameters in SSANOVA models, and a scalable algorithm for estimating multiple smoothing parameters in SSANOVAs. To validate our approach, we present two simulation studies comparing our reparameterization and algorithm to implementations of SSANOVAs and GAMs that are currently available in R. Our simulation results demonstrate that (a) our scalable SSANOVA algorithm outperforms the currently used SSANOVA algorithm, and (b) SSANOVAs can be a fast and reliable alternative to GAMs. We also provide an example with oceanographic data that demonstrates the practical advantage of our SSANOVA framework. Supplementary materials that are available online can be used to replicate the analyses in this article.</p

The Francis Crick Institute

Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding

Author: Ping Ma (15010)
Rita H. Mumm (116420)
Xiaochun Sun (116417)
Publication venue
Publication date: 30/11/2012
Field of study

<div>Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression. </div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

For each scenario with pRKHS, the percent of the total variation explained by top three SPCs (%P1, %P2 and %P3), the number of influential markers (MP1, MP2 and MP3) included in the respective SPCs, and number of SPC interactions at three given cosine thresholds.

Author: Ping Ma (15010)
Rita H. Mumm (116420)
Xiaochun Sun (116417)
Publication venue
Publication date
Field of study

Values reflect the lows and highs obtained using various marker subsets (from 500 markers to all markers). Note that larger cosine values are equivalent to smaller p-values.</p

The Francis Crick Institute

Applying pRKHS to real life scenarios, Pearson correlation coefficients between estimated breeding value (EBV) and phenotype obtained from five-fold cross-validation (CV) implemented for maize anthesis-silking interval (ASI) and grain yield (GY) for each of the 6 statistical methods.

Author: Ping Ma (15010)
Rita H. Mumm (116420)
Xiaochun Sun (116417)
Publication venue
Publication date
Field of study

The optimal number of markers contributing to phenotypic variation and percent of variations explained by the included SPCs were shown for pRKHS methods; results were averaged across five repeated fittings. Optimal cosine value was 0.3 for pRKHS-E across all datasets.</p

The Francis Crick Institute

For scenarios with a low level of epistasis (10% of the epistasis interaction effects are nonzero), Pearson correlation coefficients between estimated breeding value and true breeding value (rEBV:TBV) or phenotype (rEBV:PHE) obtained through ten-fold cross-validation with Cycle 0 (C0) and prediction of Cycle 1 (C1), implemented for simulated traits with heritability of 0.1, 0.2, 0.4, 0.8, via the various statistical methods.

Author: Ping Ma (15010)
Rita H. Mumm (116420)
Xiaochun Sun (116417)
Publication venue
Publication date
Field of study

Average correlations ± SE were obtained from thirty replications of each simulation.</p

The Francis Crick Institute

Mean percentage of variation (across the 12 simulation scenarios) explained by the top 18 SPCs with pRHKS, which together explain 70% of the total variation.

Author: Ping Ma (15010)
Rita H. Mumm (116420)
Xiaochun Sun (116417)
Publication venue
Publication date
Field of study

Mean percentage of variation (across the 12 simulation scenarios) explained by the top 18 SPCs with pRHKS, which together explain 70% of the total variation.</p

The Francis Crick Institute