3,722 research outputs found
BSA - exact algorithm computing LTS estimate
The main result of this paper is a new exact algorithm computing the estimate
given by the Least Trimmed Squares (LTS). The algorithm works under very weak
assumptions. To prove that, we study the respective objective function using
basic techniques of analysis and linear algebra.Comment: 18 pages, 1 figur
On the maximal halfspace depth of permutation-invariant distributions on the simplex
We compute the maximal halfspace depth for a class of permutation-invariant
distributions on the probability simplex. The derivations are based on
stochastic ordering results that so far were only showed to be relevant for the
Behrens-Fisher problem.Comment: 14 pages, 3 figure
Fast robust correlation for high-dimensional data
The product moment covariance is a cornerstone of multivariate data analysis,
from which one can derive correlations, principal components, Mahalanobis
distances and many other results. Unfortunately the product moment covariance
and the corresponding Pearson correlation are very susceptible to outliers
(anomalies) in the data. Several robust measures of covariance have been
developed, but few are suitable for the ultrahigh dimensional data that are
becoming more prevalent nowadays. For that one needs methods whose computation
scales well with the dimension, are guaranteed to yield a positive semidefinite
covariance matrix, and are sufficiently robust to outliers as well as
sufficiently accurate in the statistical sense of low variability. We construct
such methods using data transformations. The resulting approach is simple, fast
and widely applicable. We study its robustness by deriving influence functions
and breakdown values, and computing the mean squared error on contaminated
data. Using these results we select a method that performs well overall. This
also allows us to construct a faster version of the DetectDeviatingCells method
(Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can
deal with much higher dimensions. The approach is illustrated on genomic data
with 12,000 variables and color video data with 920,000 dimensions
Discussion of "The power of monitoring"
This is an invited comment on the discussion paper "The power of monitoring:
how to make the most of a contaminated multivariate sample" by A. Cerioli, M.
Riani, A. Atkinson and A. Corbellini that will appear in the journal
Statistical Methods & Applications
Clustering in an Object-Oriented Environment
This paper describes the incorporation of seven stand-alone clustering programs into S-PLUS, where they can now be used in a much more flexible way. The original Fortran programs carried out new cluster analysis algorithms introduced in the book of Kaufman and Rousseeuw (1990). These clustering methods were designed to be robust and to accept dissimilarity data as well as objects-by-variables data. Moreover, they each provide a graphical display and a quality index reflecting the strength of the clustering. The powerful graphics of S-PLUS made it possible to improve these graphical representations considerably. The integration of the clustering algorithms was performed according to the object-oriented principle supported by S-PLUS. The new functions have a uniform interface, and are compatible with existing S-PLUS functions. We will describe the basic idea and the use of each clustering method, together with its graphical features. Each function is briefly illustrated with an example.
- …