348,671 research outputs found
A cautionary note on robust covariance plug-in methods
Many multivariate statistical methods rely heavily on the sample covariance
matrix. It is well known though that the sample covariance matrix is highly
non-robust. One popular alternative approach for "robustifying" the
multivariate method is to simply replace the role of the covariance matrix with
some robust scatter matrix. The aim of this paper is to point out that in some
situations certain properties of the covariance matrix are needed for the
corresponding robust "plug-in" method to be a valid approach, and that not all
scatter matrices necessarily possess these important properties. In particular,
the following three multivariate methods are discussed in this paper:
independent components analysis, observational regression and graphical
modeling. For each case, it is shown that using a symmetrized robust scatter
matrix in place of the covariance matrix results in a proper robust
multivariate method.Comment: 24 pages, 7 figure
Robust canonical correlations: a comparative study.
Several approaches for robust canonical correlation analysis will be presented and discussed. A first method is based on the definition of canonical correlation analysis as looking for linear combinations of two sets of variables having maximal (robust) correlation. A second method is based on alternating robust regressions. These methods are discussed in detail and compared with the more traditional approach to robust canonical correlation via covariance matrix estimates. A simulation study compares the performance of the different estimators under several kinds of sampling schemes. Robustness is studied as well by breakdown plots.Alternating regression; Canonical correlations; Correlation measures; Projection-pursuit; Robust covariance estimation; Robust regression; Robustness;
A robust partial least squares method with applications
Partial least squares regression (PLS) is a linear regression technique developed to relate many
regressors to one or several response variables. Robust methods are introduced to reduce or
remove the effect of outlying data points. In this paper we show that if the sample covariance
matrix is properly robustified further robustification of the linear regression steps of the PLS
algorithm becomes unnecessary. The robust estimate of the covariance matrix is computed by
searching for outliers in univariate projections of the data on a combination of random directions
(Stahel-Donoho) and specific directions obtained by maximizing and minimizing the kurtosis
coefficient of the projected data, as proposed by Peña and Prieto (2006). It is shown that this
procedure is fast to apply and provides better results than other procedures proposed in the
literature. Its performance is illustrated by Monte Carlo and by an example, where the algorithm is
able to show features of the data which were undetected by previous methods
Fitting multiplicative models by robust alternating regressions.
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R-2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.Alternating regression; Approximation; Biplot; Covariance; Dispersion matrices; Effects; Estimator; Exploratory data analysis; Factor analysis; Factors; FANOVA; Least-squares; Matrix; Median polish; Model; Models; Outliers; Principal components; Robustness; Structure; Two-way table; Variables; Yield;
Standard errors estimation in the presence of high leverage point and heteroscedastic errors in multiple linear regression
In this study, the Robust Heteroscedastic Consistent Covariance Matrix (RHCCM) was proposed in order to estimate standard errors of regression coefficients in the presence of high leverage points and heteroscedastic errors in multiple linear regression. Robust Heteroscedastic Consistent Covariance Matrix (RHCCM) is the combination of a robust method and Heteroscedasticit Consistent Covariance Matrix (HCCM). The robust method is used to eliminate the effect of high leverage points while HCCM is mainly used to eliminate the effect of heteroscedastic errors. The performance of RHCCM was assessed through an empirical study and compared with results obtained when the original Heteroscedastic Consistent Covariance Matrix was used
Robust Inference Under Heteroskedasticity via the Hadamard Estimator
Drawing statistical inferences from large datasets in a model-robust way is
an important problem in statistics and data science. In this paper, we propose
methods that are robust to large and unequal noise in different observational
units (i.e., heteroskedasticity) for statistical inference in linear
regression. We leverage the Hadamard estimator, which is unbiased for the
variances of ordinary least-squares regression. This is in contrast to the
popular White's sandwich estimator, which can be substantially biased in high
dimensions. We propose to estimate the signal strength, noise level,
signal-to-noise ratio, and mean squared error via the Hadamard estimator. We
develop a new degrees of freedom adjustment that gives more accurate confidence
intervals than variants of White's sandwich estimator. Moreover, we provide
conditions ensuring the estimator is well-defined, by studying a new random
matrix ensemble in which the entries of a random orthogonal projection matrix
are squared. We also show approximate normality, using the second-order
Poincare inequality. Our work provides improved statistical theory and methods
for linear regression in high dimensions
- …