6,685 research outputs found

    High-dimensional Sparse Inverse Covariance Estimation using Greedy Methods

    Full text link
    In this paper we consider the task of estimating the non-zero pattern of the sparse inverse covariance matrix of a zero-mean Gaussian random vector from a set of iid samples. Note that this is also equivalent to recovering the underlying graph structure of a sparse Gaussian Markov Random Field (GMRF). We present two novel greedy approaches to solving this problem. The first estimates the non-zero covariates of the overall inverse covariance matrix using a series of global forward and backward greedy steps. The second estimates the neighborhood of each node in the graph separately, again using greedy forward and backward steps, and combines the intermediate neighborhoods to form an overall estimate. The principal contribution of this paper is a rigorous analysis of the sparsistency, or consistency in recovering the sparsity pattern of the inverse covariance matrix. Surprisingly, we show that both the local and global greedy methods learn the full structure of the model with high probability given just O(dlog⁑(p))O(d\log(p)) samples, which is a \emph{significant} improvement over state of the art β„“1\ell_1-regularized Gaussian MLE (Graphical Lasso) that requires O(d2log⁑(p))O(d^2\log(p)) samples. Moreover, the restricted eigenvalue and smoothness conditions imposed by our greedy methods are much weaker than the strong irrepresentable conditions required by the β„“1\ell_1-regularization based methods. We corroborate our results with extensive simulations and examples, comparing our local and global greedy methods to the β„“1\ell_1-regularized Gaussian MLE as well as the Neighborhood Greedy method to that of nodewise β„“1\ell_1-regularized linear regression (Neighborhood Lasso).Comment: Accepted to AI STAT 2012 for Oral Presentatio

    Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

    Full text link
    Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization
    • …
    corecore