Distributed Quantile Regression Analysis and a Group Variable Selection Method

Abstract

This dissertation develops novel methodologies for distributed quantile regression analysis for big data by utilizing a distributed optimization algorithm called the alternating direction method of multipliers (ADMM). Specifically, we first write the penalized quantile regression into a specific form that can be solved by the ADMM and propose numerical algorithms for solving the ADMM subproblems. This results in the distributed QR-ADMM algorithm. Then, to further reduce the computational time, we formulate the penalized quantile regression into another equivalent ADMM form in which all the subproblems have exact closed-form solutions and hence avoid iterative numerical methods. This results in the single-loop QPADM algorithm that further improve on the computational efficiency of the QR-ADMM. Both QR-ADMM and QPADM enjoy flexible parallelization by enabling data splitting across both sample space and feature space, which make them especially appealing for the case when both sample size n and feature dimension p are large. Besides the QR-ADMM and QPADM algorithms for penalized quantile regression, we also develop a group variable selection method by approximating the Bayesian information criterion. Unlike existing penalization methods for feature selection, our proposed gMIC algorithm is free of parameter tuning and hence enjoys greater computational efficiency. Although the current version of gMIC focuses on the generalized linear model, it can be naturally extended to the quantile regression for feature selection. We provide theoretical analysis for our proposed methods. Specifically, we conduct numerical convergence analysis for the QR-ADMM and QPADM algorithms, and provide asymptotical theories and oracle property of feature selection for the gMIC method. All our methods are evaluated with simulation studies and real data analysis

    Similar works