348,671 research outputs found

    A cautionary note on robust covariance plug-in methods

    Full text link
    Many multivariate statistical methods rely heavily on the sample covariance matrix. It is well known though that the sample covariance matrix is highly non-robust. One popular alternative approach for "robustifying" the multivariate method is to simply replace the role of the covariance matrix with some robust scatter matrix. The aim of this paper is to point out that in some situations certain properties of the covariance matrix are needed for the corresponding robust "plug-in" method to be a valid approach, and that not all scatter matrices necessarily possess these important properties. In particular, the following three multivariate methods are discussed in this paper: independent components analysis, observational regression and graphical modeling. For each case, it is shown that using a symmetrized robust scatter matrix in place of the covariance matrix results in a proper robust multivariate method.Comment: 24 pages, 7 figure

    Robust canonical correlations: a comparative study.

    Get PDF
    Several approaches for robust canonical correlation analysis will be presented and discussed. A first method is based on the definition of canonical correlation analysis as looking for linear combinations of two sets of variables having maximal (robust) correlation. A second method is based on alternating robust regressions. These methods are discussed in detail and compared with the more traditional approach to robust canonical correlation via covariance matrix estimates. A simulation study compares the performance of the different estimators under several kinds of sampling schemes. Robustness is studied as well by breakdown plots.Alternating regression; Canonical correlations; Correlation measures; Projection-pursuit; Robust covariance estimation; Robust regression; Robustness;

    A robust partial least squares method with applications

    Get PDF
    Partial least squares regression (PLS) is a linear regression technique developed to relate many regressors to one or several response variables. Robust methods are introduced to reduce or remove the effect of outlying data points. In this paper we show that if the sample covariance matrix is properly robustified further robustification of the linear regression steps of the PLS algorithm becomes unnecessary. The robust estimate of the covariance matrix is computed by searching for outliers in univariate projections of the data on a combination of random directions (Stahel-Donoho) and specific directions obtained by maximizing and minimizing the kurtosis coefficient of the projected data, as proposed by Peña and Prieto (2006). It is shown that this procedure is fast to apply and provides better results than other procedures proposed in the literature. Its performance is illustrated by Monte Carlo and by an example, where the algorithm is able to show features of the data which were undetected by previous methods

    Fitting multiplicative models by robust alternating regressions.

    Get PDF
    In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R-2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.Alternating regression; Approximation; Biplot; Covariance; Dispersion matrices; Effects; Estimator; Exploratory data analysis; Factor analysis; Factors; FANOVA; Least-squares; Matrix; Median polish; Model; Models; Outliers; Principal components; Robustness; Structure; Two-way table; Variables; Yield;

    Standard errors estimation in the presence of high leverage point and heteroscedastic errors in multiple linear regression

    Get PDF
    In this study, the Robust Heteroscedastic Consistent Covariance Matrix (RHCCM) was proposed in order to estimate standard errors of regression coefficients in the presence of high leverage points and heteroscedastic errors in multiple linear regression. Robust Heteroscedastic Consistent Covariance Matrix (RHCCM) is the combination of a robust method and Heteroscedasticit Consistent Covariance Matrix (HCCM). The robust method is used to eliminate the effect of high leverage points while HCCM is mainly used to eliminate the effect of heteroscedastic errors. The performance of RHCCM was assessed through an empirical study and compared with results obtained when the original Heteroscedastic Consistent Covariance Matrix was used

    Robust Inference Under Heteroskedasticity via the Hadamard Estimator

    Full text link
    Drawing statistical inferences from large datasets in a model-robust way is an important problem in statistics and data science. In this paper, we propose methods that are robust to large and unequal noise in different observational units (i.e., heteroskedasticity) for statistical inference in linear regression. We leverage the Hadamard estimator, which is unbiased for the variances of ordinary least-squares regression. This is in contrast to the popular White's sandwich estimator, which can be substantially biased in high dimensions. We propose to estimate the signal strength, noise level, signal-to-noise ratio, and mean squared error via the Hadamard estimator. We develop a new degrees of freedom adjustment that gives more accurate confidence intervals than variants of White's sandwich estimator. Moreover, we provide conditions ensuring the estimator is well-defined, by studying a new random matrix ensemble in which the entries of a random orthogonal projection matrix are squared. We also show approximate normality, using the second-order Poincare inequality. Our work provides improved statistical theory and methods for linear regression in high dimensions
    corecore