748,826 research outputs found

    Robust continuum regression.

    Get PDF
    Several applications of continuum regression (CR) to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (ordinary least squares (OLS), principal component regression (PCR) and partial least squares (PLS)). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit (PP) is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice.Regression; Applications; Data; Ordinary least squares; Least-squares; Squares; Partial least squares; Yield; Outliers; Distribution; Estimator; Projection-pursuit; Robustness; Efficiency; Simulation; Studies;

    Significance Regression: Robust Regression for Collinear Data

    Get PDF
    This paper examines robust linear multivariable regression from collinear data. A brief review of M-estimators discusses the strengths of this approach for tolerating outliers and/or perturbations in the error distributions. The review reveals that M-estimation may be unreliable if the data exhibit collinearity. Next, significance regression (SR) is discussed. SR is a successful method for treating collinearity but is not robust. A new significance regression algorithm for the weighted-least-squares error criterion (SR-WLS) is developed. Using the weights computed via M-estimation with the SR-WLS algorithm yields an effective method that robustly mollifies collinearity problems. Numerical examples illustrate the main points

    Robust continuum regression.

    Get PDF
    Several applications of continuum regression to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (Ordinary least Squares, Principal Component Regression and Partial Least Squares). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice.Advantages; Applications; Calibration; Continuum regression (CR); Data; Distribution; Efficiency; Estimator; Least-squares; M-estimators; Methods; Model; Optimal; Ordinary least squares; Outliers; Partial least squares; Precision; Prediction; Projection-pursuit; Regression; Research; Robust continuum regression (RCR); Robust multivariate calibration; Robust regression; Robustness; Simulation; Squares; Studies; Variables; Yield;

    Robust regression with imprecise data

    Get PDF
    We consider the problem of regression analysis with imprecise data. By imprecise data we mean imprecise observations of precise quantities in the form of sets of values. In this paper, we explore a recently introduced likelihood-based approach to regression with such data. The approach is very general, since it covers all kinds of imprecise data (i.e. not only intervals) and it is not restricted to linear regression. Its result consists of a set of functions, reflecting the entire uncertainty of the regression problem. Here we study in particular a robust special case of the likelihood-based imprecise regression, which can be interpreted as a generalization of the method of least median of squares. Moreover, we apply it to data from a social survey, and compare it with other approaches to regression with imprecise data. It turns out that the likelihood-based approach is the most generally applicable one and is the only approach accounting for multiple sources of uncertainty at the same time

    Robust Regression via Hard Thresholding

    Full text link
    We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix X \in R^{p x n} and an underlying model w*, the response vector is generated as y = X'w* + b where b \in R^n is the corruption vector supported over at most C.n coordinates. Existing exact recovery results for RLSR focus solely on L1-penalty based convex formulations and impose relatively strict model assumptions such as requiring the corruptions b to be selected independently of X. In this work, we study a simple hard-thresholding algorithm called TORRENT which, under mild conditions on X, can recover w* exactly even if b corrupts the response variables in an adversarial manner, i.e. both the support and entries of b are selected adversarially after observing X and w*. Our results hold under deterministic assumptions which are satisfied if X is sampled from any sub-Gaussian distribution. Finally unlike existing results that apply only to a fixed w*, generated independently of X, our results are universal and hold for any w* \in R^p. Next, we propose gradient descent-based extensions of TORRENT that can scale efficiently to large scale problems, such as high dimensional sparse recovery and prove similar recovery guarantees for these extensions. Empirically we find TORRENT, and more so its extensions, offering significantly faster recovery than the state-of-the-art L1 solvers. For instance, even on moderate-sized datasets (with p = 50K) with around 40% corrupted responses, a variant of our proposed method called TORRENT-HYB is more than 20x faster than the best L1 solver.Comment: 24 pages, 3 figure

    Robust linear least squares regression

    Get PDF
    We consider the problem of robustly predicting as well as the best linear combination of dd given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order d/nd/n without logarithmic factor unlike some standard results, where nn is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min--max framework and satisfies a d/nd/n risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min--max estimator.Comment: Published in at http://dx.doi.org/10.1214/11-AOS918 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: significant text overlap with arXiv:0902.173
    • …
    corecore