58 research outputs found
High-dimensional, robust, heteroscedastic variable selection with the adaptive LASSO, and applications to random coefficient regression
In this thesis, theoretical results for the adaptive LASSO in high-dimensional, sparse linear regression models with potentially heavy-tailed and heteroscedastic errors are developed. In doing so, the empirical pseudo Huber loss is considered as loss function and the main focus is sign-consistency of the resulting estimator. Simulations illustrate the favorable numerical performance of the proposed methodology in comparison to the ordinary adaptive LASSO. Subsequently, those results are applied to the linear random coefficient regression model, more precisely to the means, variances and covariances of the coefficients. Furthermore, sufficient conditions for the identifiability of the first and second moments, as well as asymptotic results for a fixed number of coefficients are given
Robust Regression over Averaged Uncertainty
We propose a new formulation of robust regression by integrating all
realizations of the uncertainty set and taking an averaged approach to obtain
the optimal solution for the ordinary least-squared regression problem. We show
that this formulation surprisingly recovers ridge regression and establishes
the missing link between robust optimization and the mean squared error
approaches for existing regression problems. We first prove the equivalence for
four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide
closed-form formulations of the penalty term as a function of the sample size,
feature size, as well as perturbation protection strength. We then show in
synthetic datasets with different levels of perturbations, a consistent
improvement of the averaged formulation over the existing worst-case
formulation in out-of-sample performance. Importantly, as the perturbation
level increases, the improvement increases, confirming our method's advantage
in high-noise environments. We report similar improvements in the out-of-sample
datasets in real-world regression problems obtained from UCI datasets
Robust Orthogonal Complement Principal Component Analysis
Recently, the robustification of principal component analysis has attracted
lots of attention from statisticians, engineers and computer scientists. In
this work we study the type of outliers that are not necessarily apparent in
the original observation space but can seriously affect the principal subspace
estimation. Based on a mathematical formulation of such transformed outliers, a
novel robust orthogonal complement principal component analysis (ROC-PCA) is
proposed. The framework combines the popular sparsity-enforcing and low rank
regularization techniques to deal with row-wise outliers as well as
element-wise outliers. A non-asymptotic oracle inequality guarantees the
accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle
the computational challenges, an efficient algorithm is developed on the basis
of Stiefel manifold optimization and iterative thresholding. Furthermore, a
batch variant is proposed to significantly reduce the cost in ultra high
dimensions. The paper also points out a pitfall of a common practice of SVD
reduction in robust PCA. Experiments show the effectiveness and efficiency of
ROC-PCA in both synthetic and real data
- …