Search CORE

3,722 research outputs found

BSA - exact algorithm computing LTS estimate

Author: Agulló
Hawkins
Hawkins
Hawkins
Hofmann
Hofmann
Hössjer
Karel Klouda
Rousseeuw
Rousseeuw
Rousseeuw
Sanderson
Víšek
Víšek
Publication venue: 'Elsevier BV'
Publication date: 08/01/2010
Field of study

The main result of this paper is a new exact algorithm computing the estimate given by the Least Trimmed Squares (LTS). The algorithm works under very weak assumptions. To prove that, we study the respective objective function using basic techniques of analysis and linear algebra.Comment: 18 pages, 1 figur

arXiv.org e-Print Archive

Crossref

On the maximal halfspace depth of permutation-invariant distributions on the simplex

Author: Bélisle
Davy Paindaveine
Donoho
Eaton
Germain Van Bever
Hájek
Lawton
Marshall
Olshen
Rousseeuw
Rousseeuw
Publication venue
Publication date: 01/01/2017
Field of study

We compute the maximal halfspace depth for a class of permutation-invariant distributions on the probability simplex. The derivations are based on stochastic ordering results that so far were only showed to be relevant for the Behrens-Fisher problem.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Crossref

DI-fusion

Fast robust correlation for high-dimensional data

Author: Raymaekers Jakob
Rousseeuw Peter J.
Publication venue: 'Informa UK Limited'
Publication date: 20/10/2019
Field of study

The product moment covariance is a cornerstone of multivariate data analysis, from which one can derive correlations, principal components, Mahalanobis distances and many other results. Unfortunately the product moment covariance and the corresponding Pearson correlation are very susceptible to outliers (anomalies) in the data. Several robust measures of covariance have been developed, but few are suitable for the ultrahigh dimensional data that are becoming more prevalent nowadays. For that one needs methods whose computation scales well with the dimension, are guaranteed to yield a positive semidefinite covariance matrix, and are sufficiently robust to outliers as well as sufficiently accurate in the statistical sense of low variability. We construct such methods using data transformations. The resulting approach is simple, fast and widely applicable. We study its robustness by deriving influence functions and breakdown values, and computing the mean squared error on contaminated data. Using these results we select a method that performs well overall. This also allows us to construct a faster version of the DetectDeviatingCells method (Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can deal with much higher dimensions. The approach is illustrated on genomic data with 12,000 variables and color video data with 920,000 dimensions

arXiv.org e-Print Archive

Lirias

Discussion of "The power of monitoring"

Author: Raymaekers Jakob
Rousseeuw Peter J.
Vranckx Iwein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2018
Field of study

This is an invited comment on the discussion paper "The power of monitoring: how to make the most of a contaminated multivariate sample" by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini that will appear in the journal Statistical Methods & Applications

arXiv.org e-Print Archive

Maastricht University Research Portal

Clustering in an Object-Oriented Environment

Author: Anja Struyf
Mia Hubert
Peter Rousseeuw
Publication venue
Publication date
Field of study

This paper describes the incorporation of seven stand-alone clustering programs into S-PLUS, where they can now be used in a much more flexible way. The original Fortran programs carried out new cluster analysis algorithms introduced in the book of Kaufman and Rousseeuw (1990). These clustering methods were designed to be robust and to accept dissimilarity data as well as objects-by-variables data. Moreover, they each provide a graphical display and a quality index reflecting the strength of the clustering. The powerful graphics of S-PLUS made it possible to improve these graphical representations considerably. The integration of the clustering algorithms was performed according to the object-oriented principle supported by S-PLUS. The new functions have a uniform interface, and are compatible with existing S-PLUS functions. We will describe the basic idea and the use of each clustering method, together with its graphical features. Each function is briefly illustrated with an example.

Research Papers in Economics