4,094 research outputs found
BSA - exact algorithm computing LTS estimate
The main result of this paper is a new exact algorithm computing the estimate
given by the Least Trimmed Squares (LTS). The algorithm works under very weak
assumptions. To prove that, we study the respective objective function using
basic techniques of analysis and linear algebra.Comment: 18 pages, 1 figur
Multivariate and functional classification using depth and distance
We construct classifiers for multivariate and functional data. Our approach
is based on a kind of distance between data points and classes. The distance
measure needs to be robust to outliers and invariant to linear transformations
of the data. For this purpose we can use the bagdistance which is based on
halfspace depth. It satisfies most of the properties of a norm but is able to
reflect asymmetry when the class is skewed. Alternatively we can compute a
measure of outlyingness based on the skew-adjusted projection depth. In either
case we propose the DistSpace transform which maps each data point to the
vector of its distances to all classes, followed by k-nearest neighbor (kNN)
classification of the transformed data points. This combines invariance and
robustness with the simplicity and wide applicability of kNN. The proposal is
compared with other methods in experiments with real and simulated data
Location adjustment for the minimum volume ellipsoid estimator.
Estimating multivariate location and scatter with both affine equivariance and positive breakdown has always been difficult. A well-known estimator which satisfies both properties is the Minimum Volume Ellipsoid Estimator (MVE). Computing the exact MVE is often not feasible, so one usually resorts to an approximate algorithm. In the regression setup, algorithms for positive-breakdown estimators like Least Median of Squares typically recompute the intercept at each step, to improve the result. This approach is called intercept adjustment. In this paper we show that a similar technique, called location adjustment, can be applied to the MVE. For this purpose we use the Minimum Volume Ball (MVB), in order to lower the MVE objective function. An exact algorithm for calculating the MVB is presented. As an alternative to MVB location adjustment we propose L-1 location adjustment, which does not necessarily lower the MVE objective function but yields more efficient estimates for the location part. Simulations compare the two types of location adjustment. We also obtain the maxbias curves of both L-1 and the MVB in the multivariate setting, revealing the superiority of L-1.Model;
Anomaly Detection by Robust Statistics
Real data often contain anomalous cases, also known as outliers. These may
spoil the resulting analysis but they may also contain valuable information. In
either case, the ability to detect such anomalies is essential. A useful tool
for this purpose is robust statistics, which aims to detect the outliers by
first fitting the majority of the data and then flagging data points that
deviate from it. We present an overview of several robust methods and the
resulting graphical outlier detection tools. We discuss robust procedures for
univariate, low-dimensional, and high-dimensional data, such as estimating
location and scatter, linear regression, principal component analysis,
classification, clustering, and functional data analysis. Also the challenging
new topic of cellwise outliers is introduced.Comment: To appear in WIREs Data Mining and Knowledge Discover
On the maximal halfspace depth of permutation-invariant distributions on the simplex
We compute the maximal halfspace depth for a class of permutation-invariant
distributions on the probability simplex. The derivations are based on
stochastic ordering results that so far were only showed to be relevant for the
Behrens-Fisher problem.Comment: 14 pages, 3 figure
Fast robust correlation for high-dimensional data
The product moment covariance is a cornerstone of multivariate data analysis,
from which one can derive correlations, principal components, Mahalanobis
distances and many other results. Unfortunately the product moment covariance
and the corresponding Pearson correlation are very susceptible to outliers
(anomalies) in the data. Several robust measures of covariance have been
developed, but few are suitable for the ultrahigh dimensional data that are
becoming more prevalent nowadays. For that one needs methods whose computation
scales well with the dimension, are guaranteed to yield a positive semidefinite
covariance matrix, and are sufficiently robust to outliers as well as
sufficiently accurate in the statistical sense of low variability. We construct
such methods using data transformations. The resulting approach is simple, fast
and widely applicable. We study its robustness by deriving influence functions
and breakdown values, and computing the mean squared error on contaminated
data. Using these results we select a method that performs well overall. This
also allows us to construct a faster version of the DetectDeviatingCells method
(Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can
deal with much higher dimensions. The approach is illustrated on genomic data
with 12,000 variables and color video data with 920,000 dimensions
Discussion of "The power of monitoring"
This is an invited comment on the discussion paper "The power of monitoring:
how to make the most of a contaminated multivariate sample" by A. Cerioli, M.
Riani, A. Atkinson and A. Corbellini that will appear in the journal
Statistical Methods & Applications
- …
