748,826 research outputs found
Robust continuum regression.
Several applications of continuum regression (CR) to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (ordinary least squares (OLS), principal component regression (PCR) and partial least squares (PLS)). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit (PP) is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice.Regression; Applications; Data; Ordinary least squares; Least-squares; Squares; Partial least squares; Yield; Outliers; Distribution; Estimator; Projection-pursuit; Robustness; Efficiency; Simulation; Studies;
Significance Regression: Robust Regression for Collinear Data
This paper examines robust linear multivariable regression from collinear data. A brief review of M-estimators discusses the strengths of this approach for tolerating outliers and/or perturbations in the error distributions. The review reveals that M-estimation may be unreliable if the data exhibit collinearity. Next, significance regression (SR) is discussed. SR is a successful method for treating collinearity but is not robust. A new significance regression algorithm for the weighted-least-squares error criterion (SR-WLS) is developed. Using the weights computed via M-estimation with the SR-WLS algorithm yields an effective method that robustly mollifies collinearity problems. Numerical examples illustrate the main points
Robust continuum regression.
Several applications of continuum regression to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (Ordinary least Squares, Principal Component Regression and Partial Least Squares). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice.Advantages; Applications; Calibration; Continuum regression (CR); Data; Distribution; Efficiency; Estimator; Least-squares; M-estimators; Methods; Model; Optimal; Ordinary least squares; Outliers; Partial least squares; Precision; Prediction; Projection-pursuit; Regression; Research; Robust continuum regression (RCR); Robust multivariate calibration; Robust regression; Robustness; Simulation; Squares; Studies; Variables; Yield;
Robust regression with imprecise data
We consider the problem of regression analysis with imprecise data. By imprecise data we mean imprecise observations of precise quantities in the form of sets of values. In this paper, we explore a recently introduced likelihood-based approach to regression with such data. The approach is very general, since it covers all kinds of imprecise data (i.e. not only intervals) and it is not restricted to linear regression. Its result consists of a set of functions, reflecting the entire uncertainty of the regression problem. Here we study in particular a robust special case of the likelihood-based imprecise regression, which can be interpreted as a generalization of the method of least median of squares. Moreover, we apply it to data from a social survey, and compare it with other approaches to regression with imprecise data. It turns out that the likelihood-based approach is the most generally applicable one and is the only approach accounting for multiple sources of uncertainty at the same time
Robust Regression via Hard Thresholding
We study the problem of Robust Least Squares Regression (RLSR) where several
response variables can be adversarially corrupted. More specifically, for a
data matrix X \in R^{p x n} and an underlying model w*, the response vector is
generated as y = X'w* + b where b \in R^n is the corruption vector supported
over at most C.n coordinates. Existing exact recovery results for RLSR focus
solely on L1-penalty based convex formulations and impose relatively strict
model assumptions such as requiring the corruptions b to be selected
independently of X.
In this work, we study a simple hard-thresholding algorithm called TORRENT
which, under mild conditions on X, can recover w* exactly even if b corrupts
the response variables in an adversarial manner, i.e. both the support and
entries of b are selected adversarially after observing X and w*. Our results
hold under deterministic assumptions which are satisfied if X is sampled from
any sub-Gaussian distribution. Finally unlike existing results that apply only
to a fixed w*, generated independently of X, our results are universal and hold
for any w* \in R^p.
Next, we propose gradient descent-based extensions of TORRENT that can scale
efficiently to large scale problems, such as high dimensional sparse recovery
and prove similar recovery guarantees for these extensions. Empirically we find
TORRENT, and more so its extensions, offering significantly faster recovery
than the state-of-the-art L1 solvers. For instance, even on moderate-sized
datasets (with p = 50K) with around 40% corrupted responses, a variant of our
proposed method called TORRENT-HYB is more than 20x faster than the best L1
solver.Comment: 24 pages, 3 figure
Robust linear least squares regression
We consider the problem of robustly predicting as well as the best linear
combination of given functions in least squares regression, and variants of
this problem including constraints on the parameters of the linear combination.
For the ridge estimator and the ordinary least squares estimator, and their
variants, we provide new risk bounds of order without logarithmic factor
unlike some standard results, where is the size of the training data. We
also provide a new estimator with better deviations in the presence of
heavy-tailed noise. It is based on truncating differences of losses in a
min--max framework and satisfies a risk bound both in expectation and in
deviations. The key common surprising factor of these results is the absence of
exponential moment condition on the output distribution while achieving
exponential deviations. All risk bounds are obtained through a PAC-Bayesian
analysis on truncated differences of losses. Experimental results strongly back
up our truncated min--max estimator.Comment: Published in at http://dx.doi.org/10.1214/11-AOS918 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org). arXiv admin note: significant text
overlap with arXiv:0902.173
- …