2,620 research outputs found

    Dominance and G×E interaction effects improvegenomic prediction and genetic gain inintermediate wheatgrass (Thinopyrumintermedium)

    Get PDF
    Genomic selection (GS) based recurrent selection methods were developed to accelerate the domestication of intermediate wheatgrass [IWG, Thinopyrum intermedium (Host) Barkworth & D.R. Dewey]. A subset of the breeding population phenotyped at multiple environments is used to train GS models and then predict trait values of the breeding population. In this study, we implemented several GS models that investigated the use of additive and dominance effects and G×E interaction effects to understand how they affected trait predictions in intermediate wheatgrass. We evaluated 451 genotypes from the University of Minnesota IWG breeding program for nine agronomic and domestication traits at two Minnesota locations during 2017–2018. Genet-mean based heritabilities for these traits ranged from 0.34 to 0.77. Using fourfold cross validation, we observed the highest predictive abilities (correlation of 0.67) in models that considered G×E effects. When G×E effects were fitted in GS models, trait predictions improved by 18%, 15%, 20%, and 23% for yield, spike weight, spike length, and free threshing, respectively. Genomic selection models with dominance effects showed only modest increases of up to 3% and were trait-dependent. Crossenvironment predictions were better for high heritability traits such as spike length, shatter resistance, free threshing, grain weight, and seed length than traits with low heritability and large environmental variance such as spike weight, grain yield, and seed width. Our results confirm that GS can accelerate IWG domestication by increasing genetic gain per breeding cycle and assist in selection of genotypes with promise of better performance in diverse environments

    From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation

    Full text link
    In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what we call the "Fixed-X" assumption, where covariate values are assumed to be nonrandom. By contrast, it is often more reasonable to take a "Random-X" view, where the covariate values are independently drawn for both training and prediction. To study the applicability of covariance penalties in this setting, we propose a decomposition of Random-X prediction error in which the randomness in the covariates contributes to both the bias and variance components. This decomposition is general, but we concentrate on the fundamental case of least squares regression. We prove that in this setting the move from Fixed-X to Random-X prediction results in an increase in both bias and variance. When the covariates are normally distributed and the linear model is unbiased, all terms in this decomposition are explicitly computable, which yields an extension of Mallows' Cp that we call RCpRCp. RCpRCp also holds asymptotically for certain classes of nonnormal covariates. When the noise variance is unknown, plugging in the usual unbiased estimate leads to an approach that we call RCp^\hat{RCp}, which is closely related to Sp (Tukey 1967), and GCV (Craven and Wahba 1978). For excess bias, we propose an estimate based on the "shortcut-formula" for ordinary cross-validation (OCV), resulting in an approach we call RCp+RCp^+. Theoretical arguments and numerical simulations suggest that RCP+RCP^+ is typically superior to OCV, though the difference is small. We further examine the Random-X error of other popular estimators. The surprising result we get for ridge regression is that, in the heavily-regularized regime, Random-X variance is smaller than Fixed-X variance, which can lead to smaller overall Random-X error

    A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation

    Get PDF
    Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.Comment: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Model Selection Techniques for Kernel-Based Regression Analysis Using Information Complexity Measure and Genetic Algorithms

    Get PDF
    In statistical modeling, an overparameterized model leads to poor generalization on unseen data points. This issue requires a model selection technique that appropriately chooses the form, the parameters of the proposed model and the independent variables retained for the modeling. Model selection is particularly important for linear and nonlinear statistical models, which can be easily overfitted. Recently, support vector machines (SVMs), also known as kernel-based methods, have drawn much attention as the next generation of nonlinear modeling techniques. The model selection issues for SVMs include the selection of the kernel, the corresponding parameters and the optimal subset of independent variables. In the current literature, k-fold cross-validation is the widely utilized model selection method for SVMs by the machine learning researchers. However, cross-validation is computationally intensive since one has to fit the model k times. This dissertation introduces the use of a model selection criterion based on information complexity (ICOMP) measure for kernel-based regression analysis and its applications. ICOMP penalizes both the lack-of-fit and the complexity of the model to choose the optimal model with good generalization properties. ICOMP provides a simple index for each model and does not require any validation data. It is computationally efficient and it has been successfully applied to various linear model selection problems. In this dissertation, we introduce ICOMP to the nonlinear kernel-based modeling areas. Specifically, this dissertation proposes ICOMP and its various forms in the area of kernel ridge regression; kernel partial least squares regression; kernel principal component analysis; kernel principal component regression; relevance vector regression; relevance vector logistic regression and classification problems. The model selection tasks achieved by our proposed criterion include choosing the form of the kernel function, the parameters of the kernel function, the ridge parameter, the number of latent variables, the number of principal components and the optimal subset of input variables in a simultaneous fashion for intelligent data mining. The performance of the proposed model selection method is tested on simulation bench- mark data sets as well as real data sets. The predictive performance of the proposed model selection criteria are comparable to and even better than cross-validation, which is too costly to compute and not efficient. This dissertation combines the Genetic Algorithm with ICOMP in variable subsetting, which significantly decreases the computational time as compared to the exhaustive search of all possible subsets. GA procedure is shown to be robust and performs well in our repeated simulation examples. Therefore, this dissertation provides researchers an alternative computationally efficient model selection approach for data analysis using kernel methods

    Quantifying Epistemic Uncertainty in Deep Learning

    Full text link
    Uncertainty quantification is at the core of the reliability and robustness of machine learning. In this paper, we provide a theoretical framework to dissect the uncertainty, especially the epistemic component, in deep learning into procedural variability (from the training procedure) and data variability (from the training data), which is the first such attempt in the literature to our best knowledge. We then propose two approaches to estimate these uncertainties, one based on influence function and one on batching. We demonstrate how our approaches overcome the computational difficulties in applying classical statistical methods. Experimental evaluations on multiple problem settings corroborate our theory and illustrate how our framework and estimation can provide direct guidance on modeling and data collection effort to improve deep learning performance
    • 

    corecore