Search CORE

6,776 research outputs found

A Fast Algorithm for Robust Regression with Penalised Trimmed Squares

Author: A Giloni
AC Atkinson
AC Atkinson
AS Hadi
C Agostinelli
CW Coakley
D Gervini
D Peña
D Peña
DM Hawkins
DM Hawkins
DM Hawkins
DM Sebert
G Zioutas
G Zioutas
G. Zioutas
J Agulló
JF Gentleman
L. Pitsoulis
LM Li
LS Pitsoulis
M Salibian-Barrera
MS Bazaraa
N Billor
N Billor
N Billor
O Hössjer
PJ Rousseeuw
PJ Rousseeuw
PJ Rousseeuw
PJ Rousseeuw
RJ Rousseeuw
TA Feo
VJ Yohai
VJ Yohai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in \cite{ZiouAv:05,ZiAvPi:07} and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.Comment: 27 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

Author: Flores Salvador
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a globally optimal subset using the branch and bound (BB) algorithm is limited to problems in very low dimension, tipically d<5, as the complexity of the problem increases exponentially with d. We introduce a bold pruning strategy in the BB algorithm that results in a significant reduction in computing time, at the price of a negligeable accuracy lost. The novelty of our algorithm is that the bounds at nodes of the BB tree come from pseudo-convexifications derived using a linearization technique with approximate bounds for the nonlinear terms. The approximate bounds are computed solving an auxiliary semidefinite optimization problem. We show through a computational study that our algorithm performs well in a wide set of the most difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table

arXiv.org e-Print Archive

Repositorio Académico de la Universidad de Chile

Robust Sparse Canonical Correlation Analysis

Author: Croux Christophe
Wilms Ines
Publication venue
Publication date: 01/01/2014
Field of study

Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. This paper discusses a method for Robust Sparse CCA. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. We also robustify the method such that it can cope with outliers in the data. To estimate the canonical vectors, we convert the CCA problem into an alternating regression framework, and use the sparse Least Trimmed Squares estimator. We illustrate the good performance of the Robust Sparse CCA method in several simulation studies and two real data examples

arXiv.org e-Print Archive

Lirias

Maastricht University Research Portal

PubMed Central

Evolutionary algorithms for robust methods

Author: Morell Oliver
Nunkesser Robin
Publication venue
Publication date
Field of study

A drawback of robust statistical techniques is the increased computational effort often needed compared to non robust methods. Robust estimators possessing the exact fit property, for example, are NP-hard to compute. This means thatunder the widely believed assumption that the computational complexity classes NP and P are not equalthere is no hope to compute exact solutions for large high dimensional data sets. To tackle this problem, search heuristics are used to compute NP-hard estimators in high dimensions. Here, an evolutionary algorithm that is applicable to different robust estimators is presented. Further, variants of this evolutionary algorithm for selected estimatorsmost prominently least trimmed squares and least median of squaresare introduced and shown to outperform existing popular search heuristics in difficult data situations. The results increase the applicability of robust methods and underline the usefulness of evolutionary computation for computational statistics. --Evolutionary algorithms,robust regression,least trimmed squares (LTS),least median of squares (LMS),least quantile of squares (LQS),least quartile difference (LQD)

Research Papers in Economics

Outlier Detection Using Nonconvex Penalized Regression

Author: Art B. Owen
Benjamini Y.
Hadi A. S.
Peña D.
Rousseeuw P.
Yiyuan She
Zhao P.
Publication venue
Publication date: 01/01/2010
Field of study

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the

n

data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual

L_1

penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The

L_1

penalty corresponds to soft thresholding. We introduce a thresholding (denoted by

\Theta

) based iterative procedure for outlier detection (

\Theta

-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that

\Theta

-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most

O(np)

(and sometimes much less) avoiding an

O(np^2)

least squares estimate. We describe the connection between

\Theta

-IPOD and

M

-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned

\Theta

-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with

p\gg n

, if both the coefficient vector and the outlier pattern are sparse

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

BSA - exact algorithm computing LTS estimate

Author: Agulló
Hawkins
Hawkins
Hawkins
Hofmann
Hofmann
Hössjer
Karel Klouda
Rousseeuw
Rousseeuw
Rousseeuw
Sanderson
Víšek
Víšek
Publication venue: 'Elsevier BV'
Publication date: 08/01/2010
Field of study

The main result of this paper is a new exact algorithm computing the estimate given by the Least Trimmed Squares (LTS). The algorithm works under very weak assumptions. To prove that, we study the respective objective function using basic techniques of analysis and linear algebra.Comment: 18 pages, 1 figur

arXiv.org e-Print Archive

Crossref