983,533 research outputs found
Robust Regression via Hard Thresholding
We study the problem of Robust Least Squares Regression (RLSR) where several
response variables can be adversarially corrupted. More specifically, for a
data matrix X \in R^{p x n} and an underlying model w*, the response vector is
generated as y = X'w* + b where b \in R^n is the corruption vector supported
over at most C.n coordinates. Existing exact recovery results for RLSR focus
solely on L1-penalty based convex formulations and impose relatively strict
model assumptions such as requiring the corruptions b to be selected
independently of X.
In this work, we study a simple hard-thresholding algorithm called TORRENT
which, under mild conditions on X, can recover w* exactly even if b corrupts
the response variables in an adversarial manner, i.e. both the support and
entries of b are selected adversarially after observing X and w*. Our results
hold under deterministic assumptions which are satisfied if X is sampled from
any sub-Gaussian distribution. Finally unlike existing results that apply only
to a fixed w*, generated independently of X, our results are universal and hold
for any w* \in R^p.
Next, we propose gradient descent-based extensions of TORRENT that can scale
efficiently to large scale problems, such as high dimensional sparse recovery
and prove similar recovery guarantees for these extensions. Empirically we find
TORRENT, and more so its extensions, offering significantly faster recovery
than the state-of-the-art L1 solvers. For instance, even on moderate-sized
datasets (with p = 50K) with around 40% corrupted responses, a variant of our
proposed method called TORRENT-HYB is more than 20x faster than the best L1
solver.Comment: 24 pages, 3 figure
Partial robust M-regression.
Partial Least Squares (PLS) is a standard statistical method in chemometrics. It can be considered as an incomplete, or 'partial', version of the Least Squares estimator of regression, applicable when high or perfect multicollinearity is present in the predictor variables. The Least Squares estimator is well-known to be an optimal estimator for regression, but only when the error terms are normally distributed. In the absence of normality, and in particular when outliers are in the data set, other more robust regression estimators have better properties. In this paper a 'partial' version of M-regression estimators will be defined. If an appropriate weighting scheme is chosen, partial M-estimators become entirely robust to any type of outlying points, and are called Partial Robust M-estimators. It is shown that partial robust M-regression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. The method is applied to a data set consisting of EPXMA spectra of archaeological glass vessels. This data set contains several outliers, and the advantages of partial robust M-regression are illustrated. Applying partial robust M-regression yields much smaller prediction errors for noisy calibration samples than PLS. On the other hand, if the data follow perfectly well a normal model, the loss in efficiency to be paid for is very small.Advantages; Applications; Calibration; Data; Distribution; Efficiency; Estimator; Least-squares; M-estimators; Methods; Model; Optimal; Ordinary least squares; Outliers; Partial least squares; Precision; Prediction; Projection-pursuit; Regression; Robust regression; Robustness; Simulation; Spectometric quantization; Squares; Studies; Variables; Yield;
Robust regression in Stata.
In regression analysis, the presence of outliers in the data set can strongly distort the classical least squares estimator and lead to unreliable results. To deal with this, several robust-to-outliers methods have been proposed in the statistical literature. In Stata, some of these methods are available through the commands rreg and qreg. Unfortunately, these methods only resist to some specific types of outliers and turn out to be ineffective under alternative scenarios. In this paper we present more effective robust estimators that we implemented in Stata. We also present a graphical tool that allows recognizing the type of existing outliers.S-estimators; MM-estimators; Outliers; Robustness;
Robust regression with optimisation heuristics
Linear regression is widely-used in finance. While the standard method to obtain parameter estimates, Least Squares, has very appealing theoretical and numerical properties, obtained estimates are often unstable in the presence of extreme observations which are rather common in financial time series. One approach to deal with such extreme observations is the application of robust or resistant estimators, like Least Quantile of Squares estimators. Unfortunately, for many such alternative approaches, the estimation is much more difficult than in the Least Squares case, as the objective function is not convex and often has many local optima. We apply different heuristic methods like Differential Evolution, Particle Swarm and Threshold Accepting to obtain parameter estimates. Particular emphasis is put on the convergence properties of these techniques for fixed computational resources, and the techniques’ sensitivity for different parameter settings.Optimisation heuristics, Robust Regression, Least Median of Squares
Robust continuum regression.
Several applications of continuum regression (CR) to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (ordinary least squares (OLS), principal component regression (PCR) and partial least squares (PLS)). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit (PP) is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice.Regression; Applications; Data; Ordinary least squares; Least-squares; Squares; Partial least squares; Yield; Outliers; Distribution; Estimator; Projection-pursuit; Robustness; Efficiency; Simulation; Studies;
Robust and Sparse Regression via -divergence
In high-dimensional data, many sparse regression methods have been proposed.
However, they may not be robust against outliers. Recently, the use of density
power weight has been studied for robust parameter estimation and the
corresponding divergences have been discussed. One of such divergences is the
-divergence and the robust estimator using the -divergence is
known for having a strong robustness. In this paper, we consider the robust and
sparse regression based on -divergence. We extend the
-divergence to the regression problem and show that it has a strong
robustness under heavy contamination even when outliers are heterogeneous. The
loss function is constructed by an empirical estimate of the
-divergence with sparse regularization and the parameter estimate is
defined as the minimizer of the loss function. To obtain the robust and sparse
estimate, we propose an efficient update algorithm which has a monotone
decreasing property of the loss function. Particularly, we discuss a linear
regression problem with regularization in detail. In numerical
experiments and real data analyses, we see that the proposed method outperforms
past robust and sparse methods.Comment: 25 page
- …
