13,126 research outputs found

    On Iterative Hard Thresholding Methods for High-dimensional M-Estimation

    Full text link
    The use of M-estimators in generalized linear regression models in high dimensional settings requires risk minimization with hard L0L_0 constraints. Of the known methods, the class of projected gradient descent (also known as iterative hard thresholding (IHT)) methods is known to offer the fastest and most scalable solutions. However, the current state-of-the-art is only able to analyze these methods in extremely restrictive settings which do not hold in high dimensional statistical models. In this work we bridge this gap by providing the first analysis for IHT-style methods in the high dimensional statistical setting. Our bounds are tight and match known minimax lower bounds. Our results rely on a general analysis framework that enables us to analyze several popular hard thresholding style algorithms (such as HTP, CoSaMP, SP) in the high dimensional regression setting. We also extend our analysis to a large family of "fully corrective methods" that includes two-stage and partial hard-thresholding algorithms. We show that our results hold for the problem of sparse regression, as well as low-rank matrix recovery.Comment: 20 pages, 3 figures, To appear in the proceedings of the 28th Annual Conference on Neural Information Processing Systems, NIPS 201

    Group Iterative Spectrum Thresholding for Super-Resolution Sparse Spectral Selection

    Full text link
    Recently, sparsity-based algorithms are proposed for super-resolution spectrum estimation. However, to achieve adequately high resolution in real-world signal analysis, the dictionary atoms have to be close to each other in frequency, thereby resulting in a coherent design. The popular convex compressed sensing methods break down in presence of high coherence and large noise. We propose a new regularization approach to handle model collinearity and obtain parsimonious frequency selection simultaneously. It takes advantage of the pairing structure of sine and cosine atoms in the frequency dictionary. A probabilistic spectrum screening is also developed for fast computation in high dimensions. A data-resampling version of high-dimensional Bayesian Information Criterion is used to determine the regularization parameters. Experiments show the efficacy and efficiency of the proposed algorithms in challenging situations with small sample size, high frequency resolution, and low signal-to-noise ratio

    Transformed Schatten-1 Iterative Thresholding Algorithms for Low Rank Matrix Completion

    Full text link
    We study a non-convex low-rank promoting penalty function, the transformed Schatten-1 (TS1), and its applications in matrix completion. The TS1 penalty, as a matrix quasi-norm defined on its singular values, interpolates the rank and the nuclear norm through a nonnegative parameter a. We consider the unconstrained TS1 regularized low-rank matrix recovery problem and develop a fixed point representation for its global minimizer. The TS1 thresholding functions are in closed analytical form for all parameter values. The TS1 threshold values differ in subcritical (supercritical) parameter regime where the TS1 threshold functions are continuous (discontinuous). We propose TS1 iterative thresholding algorithms and compare them with some state-of-the-art algorithms on matrix completion test problems. For problems with known rank, a fully adaptive TS1 iterative thresholding algorithm consistently performs the best under different conditions with ground truth matrix being multivariate Gaussian at varying covariance. For problems with unknown rank, TS1 algorithms with an additional rank estimation procedure approach the level of IRucL-q which is an iterative reweighted algorithm, non-convex in nature and best in performance

    Outlier Detection Using Nonconvex Penalized Regression

    Full text link
    This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the nn data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1L_1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1L_1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Θ\Theta) based iterative procedure for outlier detection (Θ\Theta-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that Θ\Theta-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np)O(np) (and sometimes much less) avoiding an O(np2)O(np^2) least squares estimate. We describe the connection between Θ\Theta-IPOD and MM-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned Θ\Theta-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with p≫np\gg n, if both the coefficient vector and the outlier pattern are sparse

    Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

    Get PDF
    This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal x∈Rpx \in \mathbb{R}^p from noisy quadratic measurements yj=(aj′x)2+ϵjy_j = (a_j' x )^2 + \epsilon_j, j=1,…,mj=1, \ldots, m, with independent sub-exponential noise ϵj\epsilon_j. The goals are to understand the effect of the sparsity of xx on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the aja_j's are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of xx.Comment: 28 pages, 4 figure

    Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization

    Full text link
    Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantee and impressive numerical performance. In this paper, we generalize HTP from compressive sensing to a generic problem setup of sparsity-constrained convex optimization. The proposed algorithm iterates between a standard gradient descent step and a hard thresholding step with or without debiasing. We prove that our method enjoys the strong guarantees analogous to HTP in terms of rate of convergence and parameter estimation accuracy. Numerical evidences show that our method is superior to the state-of-the-art greedy selection methods in sparse logistic regression and sparse precision matrix estimation tasks
    • …
    corecore