72,574 research outputs found

    Significance Regression: A Statistical Approach to Biased Linear Regression and Partial Least Squares

    Get PDF
    This paper first examines the properties of biased regressors that proceed by restricting the search for the optimal regressor to a subspace. These properties suggest features such biased regression methods should incorporate. Motivated by these observations, this work proposes a new formulation for biased regression derived from the principle of statistical significance. This new formulation, significance regression (SR), leads to partial least squares (PLS) under certain model assumptions and to more general methods under various other model kumptions. For models with multiple outputs, SR will be shown to have certain advantages over PLS. Using the new formulation a significance test is advanced for determining the number of directions to be used; for PLS, cross-validation has been the primary method for determining this quantity. The prediction and estimation properties of SR are discussed. A brief numerical example illustrates the relationship between SR and PLS

    Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    Get PDF
    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants

    The Degrees of Freedom of Partial Least Squares Regression

    Get PDF
    The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.Comment: to appear in the Journal of the American Statistical Associatio

    Structured penalties for functional linear models---partially empirical eigenvectors for regression

    Get PDF
    One of the challenges with functional data is incorporating spatial structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the ill-posed problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates spatial structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are `partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The GSVD clarifies the penalized estimation process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance, and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.Comment: 29 pages, 3 figures, 5 tables; typo/notational errors edited and intro revised per journal review proces

    Aggregated functional data model for Near-Infrared Spectroscopy calibration and prediction

    Full text link
    Calibration and prediction for NIR spectroscopy data are performed based on a functional interpretation of the Beer-Lambert formula. Considering that, for each chemical sample, the resulting spectrum is a continuous curve obtained as the summation of overlapped absorption spectra from each analyte plus a Gaussian error, we assume that each individual spectrum can be expanded as a linear combination of B-splines basis. Calibration is then performed using two procedures for estimating the individual analytes curves: basis smoothing and smoothing splines. Prediction is done by minimizing the square error of prediction. To assess the variance of the predicted values, we use a leave-one-out jackknife technique. Departures from the standard error models are discussed through a simulation study, in particular, how correlated errors impact on the calibration step and consequently on the analytes' concentration prediction. Finally, the performance of our methodology is demonstrated through the analysis of two publicly available datasets.Comment: 27 pages, 7 figures, 7 table

    A generic algorithm for reducing bias in parametric estimation

    Get PDF
    A general iterative algorithm is developed for the computation of reduced-bias parameter estimates in regular statistical models through adjustments to the score function. The algorithm unifies and provides appealing new interpretation for iterative methods that have been published previously for some specific model classes. The new algorithm can usefully be viewed as a series of iterative bias corrections, thus facilitating the adjusted score approach to bias reduction in any model for which the first- order bias of the maximum likelihood estimator has already been derived. The method is tested by application to a logit-linear multiple regression model with beta-distributed responses; the results confirm the effectiveness of the new algorithm, and also reveal some important errors in the existing literature on beta regression
    corecore