58 research outputs found
A Fast Algorithm for Robust Regression with Penalised Trimmed Squares
The presence of groups containing high leverage outliers makes linear
regression a difficult problem due to the masking effect. The available high
breakdown estimators based on Least Trimmed Squares often do not succeed in
detecting masked high leverage outliers in finite samples.
An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS)
estimator, was introduced by the authors in \cite{ZiouAv:05,ZiAvPi:07} and it
appears to be less sensitive to the masking problem. This estimator is defined
by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective
function a penalty cost for each observation is included which serves as an
upper bound on the residual error for any feasible regression line. Since the
PTS does not require presetting the number of outliers to delete from the data
set, it has better efficiency with respect to other estimators. However, due to
the high computational complexity of the resulting QMIP problem, exact
solutions for moderately large regression problems is infeasible.
In this paper we further establish the theoretical properties of the PTS
estimator, such as high breakdown and efficiency, and propose an approximate
algorithm called Fast-PTS to compute the PTS estimator for large data sets
efficiently. Extensive computational experiments on sets of benchmark instances
with varying degrees of outlier contamination, indicate that the proposed
algorithm performs well in identifying groups of high leverage outliers in
reasonable computational time.Comment: 27 page
Robust Modal Filtering and Control of the X-56A Model with Simulated Fiber Optic Sensor Failures
The X-56A aircraft is a remotely-piloted aircraft with flutter modes intentionally designed into the flight envelope. The X-56A program must demonstrate flight control while suppressing all unstable modes. A previous X-56A model study demonstrated a distributed-sensing-based active shape and active flutter suppression controller. The controller relies on an estimator which is sensitive to bias. This estimator is improved herein, and a real-time robust estimator is derived and demonstrated on 1530 fiber optic sensors. It is shown in simulation that the estimator can simultaneously reject 230 worst-case fiber optic sensor failures automatically. These sensor failures include locations with high leverage (or importance). To reduce the impact of leverage outliers, concentration based on a Mahalanobis trim criterion is introduced. A redescending M-estimator with Tukey bisquare weights is used to improve location and dispersion estimates within each concentration step in the presence of asymmetry (or leverage). A dynamic simulation is used to compare the concentrated robust estimator to a state-of-the-art real-time robust multivariate estimator. The estimators support a previously-derived mu-optimal shape controller. It is found that during the failure scenario, the concentrated modal estimator keeps the system stable
Robust model selection with LARS based on S-estimators
We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned with the case when data quality may be unreliable (e.g. there might be outliers among the observations). When the number of available covariates is moderately large, fitting all possible subsets is not a feasible option. Sequential methods like forward or backward selection are generally “greedy” and may fail to include important predictors when these are correlated. To avoid this problem Efron et al. (2004) proposed the Least Angle Regression algorithm to produce an ordered list of the available covariates (sequencing) according to their relevance. We introduce outlier robust versions of the LARS algorithm based on S-estimators for regression
(Rousseeuw and Yohai, 1984). This algorithm is computationally efficient and suit-
able even when the number of variables exceeds the sample size. Simulation studies
show that it is also robust to the presence of outliers in the data and compares favourably to previous proposals in the literature
Robust model selection using fast and robust bootstrap
Robust model selection procedures control the undue influence that outliers can
have on the selection criteria by using both robust point estimators and a bounded
loss function when measuring either the goodness-of-fit or the
expected prediction
error of each model. Furthermore, to avoid favoring over-fitting models, these two
measures can be combined with a penalty term for the size of the model. The expected prediction error conditional on the observed data ma
y be estimated using
the bootstrap. However, bootstrapping robust estimators becomes extremely time
consuming on moderate to high dimensional data sets. It is sh
own that the expected
prediction error can be estimated using a very fast and robust bootstrap method,
and that this approach yields a consistent model selection method that is computationally feasible even for a relatively large number of co
variates. Moreover, as
opposed to other bootstrap methods, this proposal avoids the numerical problems
associated with the small bootstrap samples required to obtain consistent model selection criteria. The finite-sample performance of the fas
t and robust bootstrap
model selection method is investigated through a simulation study while its feasi-
bility and good performance on moderately large regression
models are illustrated
on several real data examples.status: publishe
- …