12 research outputs found

    New Analysis of Linear Convergence of Gradient-type Methods via Unifying Error Bound Conditions

    Full text link
    This paper reveals that a common and central role, played in many error bound (EB) conditions and a variety of gradient-type methods, is a residual measure operator. On one hand, by linking this operator with other optimality measures, we define a group of abstract EB conditions, and then analyze the interplay between them; on the other hand, by using this operator as an ascent direction, we propose an abstract gradient-type method, and then derive EB conditions that are necessary and sufficient for its linear convergence. The former provides a unified framework that not only allows us to find new connections between many existing EB conditions, but also paves a way to construct new EB conditions. The latter allows us to claim the weakest conditions guaranteeing linear convergence for a number of fundamental algorithms, including the gradient method, the proximal point algorithm, and the forward-backward splitting algorithm. In addition, we show linear convergence for the proximal alternating linearized minimization algorithm under a group of equivalent EB conditions, which are strictly weaker than the traditional strongly convex condition. Moreover, by defining a new EB condition, we show Q-linear convergence of Nesterov's accelerated forward-backward algorithm without strong convexity. Finally, we verify EB conditions for a class of dual objective functions.Comment: 40 papes; incorporating the referee's comments, the presentation has been further improve

    Inertial Proximal Incremental Aggregated Gradient Method

    Full text link
    In this paper, we introduce an inertial version of the Proximal Incremental Aggregated Gradient method (PIAG) for minimizing the sum of smooth convex component functions and a possibly nonsmooth convex regularization function. Theoretically, we show that the inertial Proximal Incremental Aggregated Gradiend (iPIAG) method enjoys a global linear convergence under a quadratic growth condition, which is strictly weaker than strong convexity, provided that the stepsize is not larger than a constant. Moreover, we present two numerical expreiments which demonstrate that iPIAG outperforms the original PIAG.Comment: 17 pages, 3 figure

    Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ϵ)O(1/\epsilon)

    Full text link
    In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is O(1/ϵ)O(1/\epsilon) without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of O~(1/ϵ1θ)\widetilde O(1/\epsilon^{1-\theta})\footnote{O~()\widetilde O() suppresses a logarithmic factor.} with θ(0,1]\theta\in(0,1] capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and 1\ell_1 norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.Comment: This is a long version of the paper accepted by NIPS 201

    Global Complexity Analysis of Inexact Successive Quadratic Approximation methods for Regularized Optimization under Mild Assumptions

    Full text link
    Successive quadratic approximations (SQA) are numerically efficient for minimizing the sum of a smooth function and a convex function. The iteration complexity of inexact SQA methods has been analyzed recently. In this paper, we present an algorithmic framework of inexact SQA methods with four types of line searches, and analyze its global complexity under milder assumptions. First, we show its well-definedness and some decreasing properties. Second, under the quadratic growth condition and a uniform positive lower bound condition on stepsizes, we show that the function value sequence and the iterate sequence are linearly convergent. Moreover, we obtain a o(1/k) complexity without the quadratic growth condition, improving existing O(1/k) complexity results. At last, we show that a local gradient-Lipschitz-continuity condition could guarantee a uniform positive lower bound for the stepsizes

    Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence under Bregman Distance Growth Conditions

    Full text link
    We introduce a unified algorithmic framework, called proximal-like incremental aggregated gradient (PLIAG) method, for minimizing the sum of a convex function that consists of additive relatively smooth convex components and a proper lower semi-continuous convex regularization function, over an abstract feasible set whose geometry can be captured by using the domain of a Legendre function. The PLIAG method includes many existing algorithms in the literature as special cases such as the proximal gradient method, the Bregman proximal gradient method (also called NoLips algorithm), the incremental aggregated gradient method, the incremental aggregated proximal method, and the proximal incremental aggregated gradient method. It also includes some novel interesting iteration schemes. First we show the PLIAG method is globally sublinearly convergent without requiring a growth condition, which extends the sublinear convergence result for the proximal gradient algorithm to incremental aggregated type first order methods. Then by embedding a so-called Bregman distance growth condition into a descent-type lemma to construct a special Lyapunov function, we show that the PLIAG method is globally linearly convergent in terms of both function values and Bregman distances to the optimal solution set, provided that the step size is not greater than some positive constant. These convergence results derived in this paper are all established beyond the standard assumptions in the literature (i.e., without requiring the strong convexity and the Lipschitz gradient continuity of the smooth part of the objective). When specialized to many existing algorithms, our results recover or supplement their convergence results under strictly weaker conditions.Comment: 28 page

    Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

    Full text link
    Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. They have recently received increasing attention in the field of optimization for developing optimization algorithms with fast convergence. However, the studies of EBC in statistical learning are hitherto still limited. The main contributions of this paper are two-fold. First, we develop fast and intermediate rates of empirical risk minimization (ERM) under EBC for risk minimization with Lipschitz continuous, and smooth convex random functions. Second, we establish fast and intermediate rates of an efficient stochastic approximation (SA) algorithm for risk minimization with Lipschitz continuous random functions, which requires only one pass of nn samples and adapts to EBC. For both approaches, the convergence rates span a full spectrum between O~(1/n)\widetilde O(1/\sqrt{n}) and O~(1/n)\widetilde O(1/n) depending on the power constant in EBC, and could be even faster than O(1/n)O(1/n) in special cases for ERM. Moreover, these convergence rates are automatically adaptive without using any knowledge of EBC. Overall, this work not only strengthens the understanding of ERM for statistical learning but also brings new fast stochastic algorithms for solving a broad range of statistical learning problems

    Faster Subgradient Methods for Functions with H\"olderian Growth

    Full text link
    The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with H\"olderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical O(1/k)O(1/\sqrt{k}) result for the subgradient method. Thirdly we develop a novel "descending stairs" stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the "descending stairs" stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice.Comment: 50 pages. First revised version (under submission to Math Programming

    A Variational Approach on Level sets and Linear Convergence of Variable Bregman Proximal Gradient Method for Nonconvex Optimization Problems

    Full text link
    We develop a new variational approach on level sets aiming towards convergence rate analysis of a variable Bregman proximal gradient (VBPG) method for a broad class of nonsmooth and nonconvex optimization problems. With this new approach, we are able to extend the concepts of Bregman proximal mapping and their corresponding Bregman proximal envelops, Bregman proximal gap function to nonconvex setting. Properties of these mappings and functions are examined. An aim of this work is to provide a solid foundation on which further design and analysis of VBPG for more general nonconvex optimization problems are possible. Another aim is to provide a unified theory on linear convergence of VBPG with a particular interest towards proximal gradient methods. Centrol to our analysis for achieving the above goals is an error bound in terms of level sets and subdifferentials (level-set subdifferential error bound) along with its links to other level-set error bounds. As a consequence, we have established a number of positive results. These newly established results not only enable us to show that any accumulation of the sequence generated by VBPG is at least a critical point of the limiting subdifferential or even a critical point of the proximal subdifferential with a fixed Bregman function in each iteration, but also provide a fresh perspective that allows us to explore inner-connections among many known sufficient conditions for linear convergence of various first-order methods. Along the way, we are able to derive a number of verifiable conditions for level-set error bounds to hold, obtain linear convergence of VBPG, and derive necessary conditions and sufficient conditions for linear convergence relative to a level set for nonsmooth and nonconvex optimization problems

    On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

    Full text link
    In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having O(log1ϵ)O(\log\frac{1}{\epsilon}) gradient iterations {with constant step size} - and O(log1ϵ)O(\log\frac{1}{\epsilon}) gossip steps between every pair of these iterations - enables convergence to within ϵ\epsilon of the optimal value for smooth non-convex objectives satisfying Polyak-\L{}ojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression

    Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

    Full text link
    Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with \textsc{Gm}
    corecore