1,139 research outputs found

    Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

    Full text link
    Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parameter set can be exploited to significantly speed up training and labelling. We then introduce coordinate descent parameter update schemes for CRFs with L1 regularization. We finally provide some empirical comparisons of the proposed approach with state-of-the-art CRF training strategies. In particular, it is shown that the proposed approach is able to take profit of the sparsity to speed up processing and hence potentially handle larger dimensional models

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    A Fast Active Set Block Coordinate Descent Algorithm for â„“1\ell_1-regularized least squares

    Get PDF
    The problem of finding sparse solutions to underdetermined systems of linear equations arises in several applications (e.g. signal and image processing, compressive sensing, statistical inference). A standard tool for dealing with sparse recovery is the â„“1\ell_1-regularized least-squares approach that has been recently attracting the attention of many researchers. In this paper, we describe an active set estimate (i.e. an estimate of the indices of the zero variables in the optimal solution) for the considered problem that tries to quickly identify as many active variables as possible at a given point, while guaranteeing that some approximate optimality conditions are satisfied. A relevant feature of the estimate is that it gives a significant reduction of the objective function when setting to zero all those variables estimated active. This enables to easily embed it into a given globally converging algorithmic framework. In particular, we include our estimate into a block coordinate descent algorithm for â„“1\ell_1-regularized least squares, analyze the convergence properties of this new active set method, and prove that its basic version converges with linear rate. Finally, we report some numerical results showing the effectiveness of the approach.Comment: 28 pages, 5 figure

    A second derivative SQP method: local convergence

    Get PDF
    In [19], we gave global convergence results for a second-derivative SQP method for minimizing the exact ℓ1-merit function for a fixed value of the penalty parameter. To establish this result, we used the properties of the so-called Cauchy step, which was itself computed from the so-called predictor step. In addition, we allowed for the computation of a variety of (optional) SQP steps that were intended to improve the efficiency of the algorithm. \ud \ud Although we established global convergence of the algorithm, we did not discuss certain aspects that are critical when developing software capable of solving general optimization problems. In particular, we must have strategies for updating the penalty parameter and better techniques for defining the positive-definite matrix Bk used in computing the predictor step. In this paper we address both of these issues. We consider two techniques for defining the positive-definite matrix Bk—a simple diagonal approximation and a more sophisticated limited-memory BFGS update. We also analyze a strategy for updating the penalty paramter based on approximately minimizing the ℓ1-penalty function over a sequence of increasing values of the penalty parameter.\ud \ud Algorithms based on exact penalty functions have certain desirable properties. To be practical, however, these algorithms must be guaranteed to avoid the so-called Maratos effect. We show that a nonmonotone varient of our algorithm avoids this phenomenon and, therefore, results in asymptotically superlinear local convergence; this is verified by preliminary numerical results on the Hock and Shittkowski test set
    • …
    corecore