1,139 research outputs found
Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling
Conditional Random Fields (CRFs) constitute a popular and efficient approach
for supervised sequence labelling. CRFs can cope with large description spaces
and can integrate some form of structural dependency between labels. In this
contribution, we address the issue of efficient feature selection for CRFs
based on imposing sparsity through an L1 penalty. We first show how sparsity of
the parameter set can be exploited to significantly speed up training and
labelling. We then introduce coordinate descent parameter update schemes for
CRFs with L1 regularization. We finally provide some empirical comparisons of
the proposed approach with state-of-the-art CRF training strategies. In
particular, it is shown that the proposed approach is able to take profit of
the sparsity to speed up processing and hence potentially handle larger
dimensional models
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
A Fast Active Set Block Coordinate Descent Algorithm for -regularized least squares
The problem of finding sparse solutions to underdetermined systems of linear
equations arises in several applications (e.g. signal and image processing,
compressive sensing, statistical inference). A standard tool for dealing with
sparse recovery is the -regularized least-squares approach that has
been recently attracting the attention of many researchers. In this paper, we
describe an active set estimate (i.e. an estimate of the indices of the zero
variables in the optimal solution) for the considered problem that tries to
quickly identify as many active variables as possible at a given point, while
guaranteeing that some approximate optimality conditions are satisfied. A
relevant feature of the estimate is that it gives a significant reduction of
the objective function when setting to zero all those variables estimated
active. This enables to easily embed it into a given globally converging
algorithmic framework. In particular, we include our estimate into a block
coordinate descent algorithm for -regularized least squares, analyze
the convergence properties of this new active set method, and prove that its
basic version converges with linear rate. Finally, we report some numerical
results showing the effectiveness of the approach.Comment: 28 pages, 5 figure
A second derivative SQP method: local convergence
In [19], we gave global convergence results for a second-derivative SQP method for minimizing the exact â„“1-merit function for a fixed value of the penalty parameter. To establish this result, we used the properties of the so-called Cauchy step, which was itself computed from the so-called predictor step. In addition, we allowed for the computation of a variety of (optional) SQP steps that were intended to improve the efficiency of the algorithm. \ud
\ud
Although we established global convergence of the algorithm, we did not discuss certain aspects that are critical when developing software capable of solving general optimization problems. In particular, we must have strategies for updating the penalty parameter and better techniques for defining the positive-definite matrix Bk used in computing the predictor step. In this paper we address both of these issues. We consider two techniques for defining the positive-definite matrix Bk—a simple diagonal approximation and a more sophisticated limited-memory BFGS update. We also analyze a strategy for updating the penalty paramter based on approximately minimizing the ℓ1-penalty function over a sequence of increasing values of the penalty parameter.\ud
\ud
Algorithms based on exact penalty functions have certain desirable properties. To be practical, however, these algorithms must be guaranteed to avoid the so-called Maratos effect. We show that a nonmonotone varient of our algorithm avoids this phenomenon and, therefore, results in asymptotically superlinear local convergence; this is verified by preliminary numerical results on the Hock and Shittkowski test set
- …