7,255 research outputs found

    Sparse least trimmed squares regression.

    Get PDF
    Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1 penalty on the coefficient estimates to the well known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. Both the simulation study and the real data example show that the LTS has better prediction performance than its competitors in the presence of leverage points.Breakdown point; Outliers; Penalized regression; Robust regression; Trimming;

    Sparse least trimmed squares regression for analyzing high-dimensional large data sets

    Get PDF
    Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points

    Outlier Detection Using Nonconvex Penalized Regression

    Full text link
    This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the nn data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1L_1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1L_1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Θ\Theta) based iterative procedure for outlier detection (Θ\Theta-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that Θ\Theta-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np)O(np) (and sometimes much less) avoiding an O(np2)O(np^2) least squares estimate. We describe the connection between Θ\Theta-IPOD and MM-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned Θ\Theta-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with pnp\gg n, if both the coefficient vector and the outlier pattern are sparse

    Least angle regression for time series forecasting with many predictors.

    Get PDF
    Least Angle Regression(LARS)is a variable selection method with proven performance for cross-sectional data. In this paper, it is extended to time series forecasting with many predictors. The new method builds parsimonious forecast models,taking the time series dynamics into account. It is a exible method that allows for ranking the different predictors according to their predictive content. The time series LARS shows good forecast performance, as illustrated in a simulation study and two real data applications, where it is compared with the standard LARS algorithm and forecasting using diffusion indices.macro-econometrics; model selection; penalized regression; variable ranking;

    Robust model selection and outlier detection in linear regressions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 191-196).In this thesis, we study the problems of robust model selection and outlier detection in linear regression. The results of data analysis based on linear regressions are highly sensitive to model choice and the existence of outliers in the data. This thesis aims to help researchers to choose the correct model when their data could be contaminated with outliers, to detect possible outliers in their data, and to study the impact that such outliers have on their analysis. First, we discuss the problem of robust model selection. Many methods for performing model selection were designed with the standard error model ... and least squares estimation in mind. These methods often perform poorly on real world data, which can include outliers. Robust model selection methods aim to protect us from outliers and capture the model that represents the bulk of the data. We review the currently available model selection algorithms (both non-robust and robust) and present five new algorithms. Our algorithms aim to improve upon the currently available algorithms, both in terms of accuracy and computational feasibility. We demonstrate the improved accuracy of our algorithms via a simulation study and a study on a real world data set.(cont.) Finally, we discuss the problem of outlier detection. In addition to model selection, outliers can adversely influence many other outcomes of regression-based data analysis. We describe a new outlier diagnostic tool, which we call diagnostic data traces. This tool can be used to detect outliers and study their influence on a variety of regression statistics. We demonstrate our tool on several data sets, which are considered benchmarks in the field of outlier detection.by Lauren McCann.Ph.D

    Vol. 16, No. 1 (Full Issue)

    Get PDF
    corecore