Search CORE

7,255 research outputs found

Sparse least trimmed squares regression.

Author: Alfons Andreas
Croux Christophe
Gelper Sarah
Publication venue
Publication date
Field of study

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1 penalty on the coefficient estimates to the well known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. Both the simulation study and the real data example show that the LTS has better prediction performance than its competitors in the presence of leverage points.Breakdown point; Outliers; Penalized regression; Robust regression; Trimming;

Research Papers in Economics

Fast robust estimation of prediction error based on resampling

Author: Khan Jafar A
Van Aelst Stefan
Zamar Ruben H
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Sparse least trimmed squares regression for analyzing high-dimensional large data sets

Author: Alfons A. (Andreas)
Croux C. (Christophe)
Gelper S.E.C. (Sarah)
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points

arXiv.org e-Print Archive

Lirias

Crossref

EUR Research Repository

Erasmus University Digital Repository

Outlier Detection Using Nonconvex Penalized Regression

Author: Art B. Owen
Benjamini Y.
Hadi A. S.
Peña D.
Rousseeuw P.
Yiyuan She
Zhao P.
Publication venue
Publication date: 01/01/2010
Field of study

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the

n

data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual

L_1

penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The

L_1

penalty corresponds to soft thresholding. We introduce a thresholding (denoted by

\Theta

) based iterative procedure for outlier detection (

\Theta

-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that

\Theta

-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most

O(np)

(and sometimes much less) avoiding an

O(np^2)

least squares estimate. We describe the connection between

\Theta

-IPOD and

M

-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned

\Theta

-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with

p\gg n

, if both the coefficient vector and the outlier pattern are sparse

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

Least angle regression for time series forecasting with many predictors.

Author: Croux Christophe
Gelper Sarah
Publication venue
Publication date
Field of study

Least Angle Regression(LARS)is a variable selection method with proven performance for cross-sectional data. In this paper, it is extended to time series forecasting with many predictors. The new method builds parsimonious forecast models,taking the time series dynamics into account. It is a exible method that allows for ranking the different predictors according to their predictive content. The time series LARS shows good forecast performance, as illustrated in a simulation study and two real data applications, where it is compared with the standard LARS algorithm and forecasting using diffusion indices.macro-econometrics; model selection; penalized regression; variable ranking;

Research Papers in Economics

Robust model selection and outlier detection in linear regressions

Author: McCann Lauren, Ph. D. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 191-196).In this thesis, we study the problems of robust model selection and outlier detection in linear regression. The results of data analysis based on linear regressions are highly sensitive to model choice and the existence of outliers in the data. This thesis aims to help researchers to choose the correct model when their data could be contaminated with outliers, to detect possible outliers in their data, and to study the impact that such outliers have on their analysis. First, we discuss the problem of robust model selection. Many methods for performing model selection were designed with the standard error model ... and least squares estimation in mind. These methods often perform poorly on real world data, which can include outliers. Robust model selection methods aim to protect us from outliers and capture the model that represents the bulk of the data. We review the currently available model selection algorithms (both non-robust and robust) and present five new algorithms. Our algorithms aim to improve upon the currently available algorithms, both in terms of accuracy and computational feasibility. We demonstrate the improved accuracy of our algorithms via a simulation study and a study on a real world data set.(cont.) Finally, we discuss the problem of outlier detection. In addition to model selection, outliers can adversely influence many other outcomes of regression-based data analysis. We describe a new outlier diagnostic tool, which we call diagnostic data traces. This tool can be used to detect outliers and study their influence on a variety of regression statistics. We demonstrate our tool on several data sets, which are considered benchmarks in the field of outlier detection.by Lauren McCann.Ph.D

DSpace@MIT

Soils And Human Health:The Investigation Of Soil Variables Associated With Podoconiosis In North West Cameroon, And Their Detection By Hyperspectral Remote Sensing

Author: Gislam Harriet
Publication venue
Publication date: 01/10/2022
Field of study

University of Brighton Research Portal

Vol. 16, No. 1 (Full Issue)

Author: Editors JMASM
Publication venue: DigitalCommons@WayneState
Publication date: 01/05/2017
Field of study

Digital Commons@Wayne State University