154 research outputs found

    Unconventional Regression for High-Dimensional Data Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. June 2017. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); xiv, 161 pages.Massive and complex data present new challenges that conventional sparse penalized mean regressions, such as the penalized least squares, cannot fully solve. For example, in high-dimensional data, non-constant variance, or heteroscedasticity, is commonly present but often receives little attention in penalized mean regressions. Heavy-tailedness is also frequently encountered in many high-dimensional scientific data. To resolve these issues, unconventional sparse regressions such as penalized quantile regression and penalized asymmetric least squares are the appropriate tools because they can infer the complete picture of the entire probability distribution. Asymmetric least squares regression has wide applications in statistics, econometrics and finance. It is also an important tool in analyzing heteroscedasticity and is computationally friendlier than quantile regression. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. We systematically study the Sparse Asymmetric LEast Squares (SALES) under high dimensionality and fully explore its theoretical and numerical properties. SALES may fail to tell which variables are important for the mean function and which variables are important for the scale/variance function, especially when there are variables that are important for both mean and scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression for calibrated heteroscedasticity analysis. Penalized quantile regression has been shown to enjoy very good theoretical properties in the literature. However, the computational issue of penalized quantile regression has not yet been fully resolved in the literature. We introduce fast alternating direction method of multipliers (ADMM) algorithms for computing penalized quantile regression with the lasso, adaptive lasso, and folded concave penalties. The convergence properties of the proposed algorithms are established and numerical experiments demonstrate their computational efficiency and accuracy. To efficiently estimate coefficients in high-dimensional linear models without prior knowledge of the error distributions, sparse penalized composite quantile regression (CQR) provides protection against significant efficiency decay regardless of the error distribution. We consider both lasso and folded concave penalized CQR and establish their theoretical properties under ultrahigh dimensionality. A unified efficient numerical algorithm based on ADMM is also proposed to solve the penalized CQR. Numerical studies demonstrate the superior performance of penalized CQR over penalized least squares under many error distributions

    High-Dimensional Composite Quantile Regression: Optimal Statistical Guarantees and Fast Algorithms

    Full text link
    The composite quantile regression (CQR) was introduced by Zou and Yuan [Ann. Statist. 36 (2008) 1108--1126] as a robust regression method for linear models with heavy-tailed errors while achieving high efficiency. Its penalized counterpart for high-dimensional sparse models was recently studied in Gu and Zou [IEEE Trans. Inf. Theory 66 (2020) 7132--7154], along with a specialized optimization algorithm based on the alternating direct method of multipliers (ADMM). Compared to the various first-order algorithms for penalized least squares, ADMM-based algorithms are not well-adapted to large-scale problems. To overcome this computational hardness, in this paper we employ a convolution-smoothed technique to CQR, complemented with iteratively reweighted β„“1\ell_1-regularization. The smoothed composite loss function is convex, twice continuously differentiable, and locally strong convex with high probability. We propose a gradient-based algorithm for penalized smoothed CQR via a variant of the majorize-minimization principal, which gains substantial computational efficiency over ADMM. Theoretically, we show that the iteratively reweighted β„“1\ell_1-penalized smoothed CQR estimator achieves near-minimax optimal convergence rate under heavy-tailed errors without any moment constraint, and further achieves near-oracle convergence rate under a weaker minimum signal strength condition than needed in Gu and Zou (2020). Numerical studies demonstrate that the proposed method exhibits significant computational advantages without compromising statistical performance compared to two state-of-the-art methods that achieve robustness and high efficiency simultaneously.Comment: 42 pages, 7 figure

    Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex Penalties

    Full text link
    This paper investigates quantile regression in the presence of non-convex and non-smooth sparse penalties, such as the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD). The non-smooth and non-convex nature of these problems often leads to convergence difficulties for many algorithms. While iterative techniques like coordinate descent and local linear approximation can facilitate convergence, the process is often slow. This sluggish pace is primarily due to the need to run these approximation techniques until full convergence at each step, a requirement we term as a \emph{secondary convergence iteration}. To accelerate the convergence speed, we employ the alternating direction method of multipliers (ADMM) and introduce a novel single-loop smoothing ADMM algorithm with an increasing penalty parameter, named SIAD, specifically tailored for sparse-penalized quantile regression. We first delve into the convergence properties of the proposed SIAD algorithm and establish the necessary conditions for convergence. Theoretically, we confirm a convergence rate of o(kβˆ’14)o\big({k^{-\frac{1}{4}}}\big) for the sub-gradient bound of augmented Lagrangian. Subsequently, we provide numerical results to showcase the effectiveness of the SIAD algorithm. Our findings highlight that the SIAD method outperforms existing approaches, providing a faster and more stable solution for sparse-penalized quantile regression

    Distributed Quantile Regression Analysis and a Group Variable Selection Method

    Get PDF
    This dissertation develops novel methodologies for distributed quantile regression analysis for big data by utilizing a distributed optimization algorithm called the alternating direction method of multipliers (ADMM). Specifically, we first write the penalized quantile regression into a specific form that can be solved by the ADMM and propose numerical algorithms for solving the ADMM subproblems. This results in the distributed QR-ADMM algorithm. Then, to further reduce the computational time, we formulate the penalized quantile regression into another equivalent ADMM form in which all the subproblems have exact closed-form solutions and hence avoid iterative numerical methods. This results in the single-loop QPADM algorithm that further improve on the computational efficiency of the QR-ADMM. Both QR-ADMM and QPADM enjoy flexible parallelization by enabling data splitting across both sample space and feature space, which make them especially appealing for the case when both sample size n and feature dimension p are large. Besides the QR-ADMM and QPADM algorithms for penalized quantile regression, we also develop a group variable selection method by approximating the Bayesian information criterion. Unlike existing penalization methods for feature selection, our proposed gMIC algorithm is free of parameter tuning and hence enjoys greater computational efficiency. Although the current version of gMIC focuses on the generalized linear model, it can be naturally extended to the quantile regression for feature selection. We provide theoretical analysis for our proposed methods. Specifically, we conduct numerical convergence analysis for the QR-ADMM and QPADM algorithms, and provide asymptotical theories and oracle property of feature selection for the gMIC method. All our methods are evaluated with simulation studies and real data analysis
    • …
    corecore