660 research outputs found

    DC Proximal Newton for Non-Convex Optimization Problems

    Get PDF
    We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm {admit} as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function and regularizers are non-convex

    Block stochastic gradient iteration for convex and nonconvex optimization

    Full text link
    The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are easier to update individually than together, BCD has a lower per-iteration cost. This paper introduces a method that combines the features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifically, a block stochastic gradient (BSG) method is proposed for solving both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions from the sum term in the objective, and then, using those samples, it updates all the blocks of variables in either a deterministic or a randomly shuffled order. Its convergence for both convex and nonconvex cases are established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view
    • …
    corecore