1,660 research outputs found
The Gaussian rank correlation estimator: Robustness properties.
The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix based on the Gaussian rank correlation is always positive semidefinite, and very easy to compute, also in high dimensions. A simulation study confirms the good efficiency and robustness properties of the proposed estimator with respect to the popular Kendall and Spearman correlation measures. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.Breakdown; Correlation; Efficiency; Robustness; Van der Waerden;
Robust canonical correlations: a comparative study.
Several approaches for robust canonical correlation analysis will be presented and discussed. A first method is based on the definition of canonical correlation analysis as looking for linear combinations of two sets of variables having maximal (robust) correlation. A second method is based on alternating robust regressions. These methods are discussed in detail and compared with the more traditional approach to robust canonical correlation via covariance matrix estimates. A simulation study compares the performance of the different estimators under several kinds of sampling schemes. Robustness is studied as well by breakdown plots.Alternating regression; Canonical correlations; Correlation measures; Projection-pursuit; Robust covariance estimation; Robust regression; Robustness;
Tyler shape depth
In many problems from multivariate analysis, the parameter of interest is a
shape matrix, that is, a normalized version of the corresponding scatter or
dispersion matrix. In this paper, we propose a depth concept for shape matrices
that involves data points only through their directions from the center of the
distribution. We use the terminology Tyler shape depth since the resulting
estimator of shape, namely the deepest shape matrix, is the median-based
counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation,
shape depth, like its Tyler antecedent, also allows hypothesis testing on
shape. Its main benefit, however, lies in the ranking of shape matrices it
provides, whose practical relevance is illustrated in principal component
analysis and in shape-based outlier detection. We study the invariance,
quasi-concavity and continuity properties of Tyler shape depth, the topological
and boundedness properties of the corresponding depth regions, existence of a
deepest shape matrix and prove Fisher consistency in the elliptical case.
Finally, we derive a Glivenko-Cantelli-type result and establish almost sure
consistency of the deepest shape matrix estimator.Comment: 28 pages, 5 figure
Robustness versus efficiency for nonparametric correlation measures.
Nonparametric correlation measures at the Kendall and Spearman correlation are widely used in the behavioral sciences. These measures are often said to be robust, in the sense of being resistant to outlying observations. In this note we formally study their robustness by means of their influence functions. Since robustness of an estimator often comes at the price of a loss inprecision, we compute efficiencies at the normal model. A comparison with robust correlation measures derived from robust covariance matrices is made. We conclude that both Spearman and Kendall correlation measures combine good robustness properties with high efficiency.asymptotic variance; correlation; gross-error sensitivity; influence function; Kendall correlation; robustness; Spearman correlation;
Fast robust correlation for high-dimensional data
The product moment covariance is a cornerstone of multivariate data analysis,
from which one can derive correlations, principal components, Mahalanobis
distances and many other results. Unfortunately the product moment covariance
and the corresponding Pearson correlation are very susceptible to outliers
(anomalies) in the data. Several robust measures of covariance have been
developed, but few are suitable for the ultrahigh dimensional data that are
becoming more prevalent nowadays. For that one needs methods whose computation
scales well with the dimension, are guaranteed to yield a positive semidefinite
covariance matrix, and are sufficiently robust to outliers as well as
sufficiently accurate in the statistical sense of low variability. We construct
such methods using data transformations. The resulting approach is simple, fast
and widely applicable. We study its robustness by deriving influence functions
and breakdown values, and computing the mean squared error on contaminated
data. Using these results we select a method that performs well overall. This
also allows us to construct a faster version of the DetectDeviatingCells method
(Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can
deal with much higher dimensions. The approach is illustrated on genomic data
with 12,000 variables and color video data with 920,000 dimensions
M-estimation of multivariate regressions
Includes bibliographical references (p.19-20)
- ā¦