Search CORE

4,094 research outputs found

BSA - exact algorithm computing LTS estimate

Author: Agulló
Hawkins
Hawkins
Hawkins
Hofmann
Hofmann
Hössjer
Karel Klouda
Rousseeuw
Rousseeuw
Rousseeuw
Sanderson
Víšek
Víšek
Publication venue: 'Elsevier BV'
Publication date: 08/01/2010
Field of study

The main result of this paper is a new exact algorithm computing the estimate given by the Least Trimmed Squares (LTS). The algorithm works under very weak assumptions. To prove that, we study the respective objective function using basic techniques of analysis and linear algebra.Comment: 18 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Multivariate and functional classification using depth and distance

Author: Hubert Mia
Rousseeuw Peter J.
Segaert Pieter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We construct classifiers for multivariate and functional data. Our approach is based on a kind of distance between data points and classes. The distance measure needs to be robust to outliers and invariant to linear transformations of the data. For this purpose we can use the bagdistance which is based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry when the class is skewed. Alternatively we can compute a measure of outlyingness based on the skew-adjusted projection depth. In either case we propose the DistSpace transform which maps each data point to the vector of its distances to all classes, followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines invariance and robustness with the simplicity and wide applicability of kNN. The proposal is compared with other methods in experiments with real and simulated data

Lirias

arXiv.org e-Print Archive

Crossref

Location adjustment for the minimum volume ellipsoid estimator.

Author: Croux Christophe
Haesbroeck G
Rousseeuw Peter
Publication venue
Publication date
Field of study

Estimating multivariate location and scatter with both affine equivariance and positive breakdown has always been difficult. A well-known estimator which satisfies both properties is the Minimum Volume Ellipsoid Estimator (MVE). Computing the exact MVE is often not feasible, so one usually resorts to an approximate algorithm. In the regression setup, algorithms for positive-breakdown estimators like Least Median of Squares typically recompute the intercept at each step, to improve the result. This approach is called intercept adjustment. In this paper we show that a similar technique, called location adjustment, can be applied to the MVE. For this purpose we use the Minimum Volume Ball (MVB), in order to lower the MVE objective function. An exact algorithm for calculating the MVB is presented. As an alternative to MVB location adjustment we propose L-1 location adjustment, which does not necessarily lower the MVE objective function but yields more efficient estimates for the location part. Simulations compare the two types of location adjustment. We also obtain the maxbias curves of both L-1 and the MVB in the multivariate setting, revealing the superiority of L-1.Model;

Research Papers in Economics

Anomaly Detection by Robust Statistics

Author: Hubert Mia
Rousseeuw Peter J.
Publication venue: 'Wiley'
Publication date: 14/10/2017
Field of study

Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for this purpose is robust statistics, which aims to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. We present an overview of several robust methods and the resulting graphical outlier detection tools. We discuss robust procedures for univariate, low-dimensional, and high-dimensional data, such as estimating location and scatter, linear regression, principal component analysis, classification, clustering, and functional data analysis. Also the challenging new topic of cellwise outliers is introduced.Comment: To appear in WIREs Data Mining and Knowledge Discover

Lirias

arXiv.org e-Print Archive

Crossref

On the maximal halfspace depth of permutation-invariant distributions on the simplex

Author: Bélisle
Davy Paindaveine
Donoho
Eaton
Germain Van Bever
Hájek
Lawton
Marshall
Olshen
Rousseeuw
Rousseeuw
Publication venue
Publication date: 01/01/2017
Field of study

We compute the maximal halfspace depth for a class of permutation-invariant distributions on the probability simplex. The derivations are based on stochastic ordering results that so far were only showed to be relevant for the Behrens-Fisher problem.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Crossref

DI-fusion

Fast robust correlation for high-dimensional data

Author: Raymaekers Jakob
Rousseeuw Peter J.
Publication venue: 'Informa UK Limited'
Publication date: 20/10/2019
Field of study

The product moment covariance is a cornerstone of multivariate data analysis, from which one can derive correlations, principal components, Mahalanobis distances and many other results. Unfortunately the product moment covariance and the corresponding Pearson correlation are very susceptible to outliers (anomalies) in the data. Several robust measures of covariance have been developed, but few are suitable for the ultrahigh dimensional data that are becoming more prevalent nowadays. For that one needs methods whose computation scales well with the dimension, are guaranteed to yield a positive semidefinite covariance matrix, and are sufficiently robust to outliers as well as sufficiently accurate in the statistical sense of low variability. We construct such methods using data transformations. The resulting approach is simple, fast and widely applicable. We study its robustness by deriving influence functions and breakdown values, and computing the mean squared error on contaminated data. Using these results we select a method that performs well overall. This also allows us to construct a faster version of the DetectDeviatingCells method (Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can deal with much higher dimensions. The approach is illustrated on genomic data with 12,000 variables and color video data with 920,000 dimensions

arXiv.org e-Print Archive

Lirias

Maastricht University Research Portal

Discussion of "The power of monitoring"

Author: Raymaekers Jakob
Rousseeuw Peter J.
Vranckx Iwein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2018
Field of study

This is an invited comment on the discussion paper "The power of monitoring: how to make the most of a contaminated multivariate sample" by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini that will appear in the journal Statistical Methods & Applications

Lirias

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref