1,660 research outputs found

    The Gaussian rank correlation estimator: Robustness properties.

    Get PDF
    The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix based on the Gaussian rank correlation is always positive semidefinite, and very easy to compute, also in high dimensions. A simulation study confirms the good efficiency and robustness properties of the proposed estimator with respect to the popular Kendall and Spearman correlation measures. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.Breakdown; Correlation; Efficiency; Robustness; Van der Waerden;

    Robust canonical correlations: a comparative study.

    Get PDF
    Several approaches for robust canonical correlation analysis will be presented and discussed. A first method is based on the definition of canonical correlation analysis as looking for linear combinations of two sets of variables having maximal (robust) correlation. A second method is based on alternating robust regressions. These methods are discussed in detail and compared with the more traditional approach to robust canonical correlation via covariance matrix estimates. A simulation study compares the performance of the different estimators under several kinds of sampling schemes. Robustness is studied as well by breakdown plots.Alternating regression; Canonical correlations; Correlation measures; Projection-pursuit; Robust covariance estimation; Robust regression; Robustness;

    Tyler shape depth

    Get PDF
    In many problems from multivariate analysis, the parameter of interest is a shape matrix, that is, a normalized version of the corresponding scatter or dispersion matrix. In this paper, we propose a depth concept for shape matrices that involves data points only through their directions from the center of the distribution. We use the terminology Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of shape matrices it provides, whose practical relevance is illustrated in principal component analysis and in shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the topological and boundedness properties of the corresponding depth regions, existence of a deepest shape matrix and prove Fisher consistency in the elliptical case. Finally, we derive a Glivenko-Cantelli-type result and establish almost sure consistency of the deepest shape matrix estimator.Comment: 28 pages, 5 figure

    Robustness versus efficiency for nonparametric correlation measures.

    Get PDF
    Nonparametric correlation measures at the Kendall and Spearman correlation are widely used in the behavioral sciences. These measures are often said to be robust, in the sense of being resistant to outlying observations. In this note we formally study their robustness by means of their influence functions. Since robustness of an estimator often comes at the price of a loss inprecision, we compute efficiencies at the normal model. A comparison with robust correlation measures derived from robust covariance matrices is made. We conclude that both Spearman and Kendall correlation measures combine good robustness properties with high efficiency.asymptotic variance; correlation; gross-error sensitivity; influence function; Kendall correlation; robustness; Spearman correlation;

    Fast robust correlation for high-dimensional data

    Full text link
    The product moment covariance is a cornerstone of multivariate data analysis, from which one can derive correlations, principal components, Mahalanobis distances and many other results. Unfortunately the product moment covariance and the corresponding Pearson correlation are very susceptible to outliers (anomalies) in the data. Several robust measures of covariance have been developed, but few are suitable for the ultrahigh dimensional data that are becoming more prevalent nowadays. For that one needs methods whose computation scales well with the dimension, are guaranteed to yield a positive semidefinite covariance matrix, and are sufficiently robust to outliers as well as sufficiently accurate in the statistical sense of low variability. We construct such methods using data transformations. The resulting approach is simple, fast and widely applicable. We study its robustness by deriving influence functions and breakdown values, and computing the mean squared error on contaminated data. Using these results we select a method that performs well overall. This also allows us to construct a faster version of the DetectDeviatingCells method (Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can deal with much higher dimensions. The approach is illustrated on genomic data with 12,000 variables and color video data with 920,000 dimensions

    M-estimation of multivariate regressions

    Get PDF
    Includes bibliographical references (p.19-20)
    • ā€¦
    corecore