58 research outputs found

    High-dimensional, robust, heteroscedastic variable selection with the adaptive LASSO, and applications to random coefficient regression

    Get PDF
    In this thesis, theoretical results for the adaptive LASSO in high-dimensional, sparse linear regression models with potentially heavy-tailed and heteroscedastic errors are developed. In doing so, the empirical pseudo Huber loss is considered as loss function and the main focus is sign-consistency of the resulting estimator. Simulations illustrate the favorable numerical performance of the proposed methodology in comparison to the ordinary adaptive LASSO. Subsequently, those results are applied to the linear random coefficient regression model, more precisely to the means, variances and covariances of the coefficients. Furthermore, sufficient conditions for the identifiability of the first and second moments, as well as asymptotic results for a fixed number of coefficients are given

    Robust Regression over Averaged Uncertainty

    Full text link
    We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets

    Robust Orthogonal Complement Principal Component Analysis

    Full text link
    Recently, the robustification of principal component analysis has attracted lots of attention from statisticians, engineers and computer scientists. In this work we study the type of outliers that are not necessarily apparent in the original observation space but can seriously affect the principal subspace estimation. Based on a mathematical formulation of such transformed outliers, a novel robust orthogonal complement principal component analysis (ROC-PCA) is proposed. The framework combines the popular sparsity-enforcing and low rank regularization techniques to deal with row-wise outliers as well as element-wise outliers. A non-asymptotic oracle inequality guarantees the accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle the computational challenges, an efficient algorithm is developed on the basis of Stiefel manifold optimization and iterative thresholding. Furthermore, a batch variant is proposed to significantly reduce the cost in ultra high dimensions. The paper also points out a pitfall of a common practice of SVD reduction in robust PCA. Experiments show the effectiveness and efficiency of ROC-PCA in both synthetic and real data
    • …
    corecore