780 research outputs found

    Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold

    Full text link
    We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this kind of problems can be classified into three classes. Algorithms in the first class rely on information of the subgradients of the objective function and thus tend to converge slowly in practice. Algorithms in the second class are proximal point algorithms, which involve subproblems that can be as difficult as the original problem. Algorithms in the third class are based on operator-splitting techniques, but they usually lack rigorous convergence guarantees. In this paper, we propose a retraction-based proximal gradient method for solving this class of problems. We prove that the proposed method globally converges to a stationary point. Iteration complexity for obtaining an ϵ\epsilon-stationary solution is also analyzed. Numerical results on solving sparse PCA and compressed modes problems are reported to demonstrate the advantages of the proposed method

    TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods

    Full text link
    TMAC is a toolbox written in C++11 that implements algorithms based on a set of modern methods for large-scale optimization. It covers a variety of optimization problems, which can be both smooth and nonsmooth, convex and nonconvex, as well as constrained and unconstrained. The algorithms implemented in TMAC, such as the coordinate up- date method and operator splitting method, are scalable as they decompose a problem into simple subproblems. These algorithms can run in a multi-threaded fashion, either synchronously or asynchronously, to take advantages of all the cores available. TMAC architecture mimics how a scientist writes down an optimization algorithm. Therefore, it is easy for one to obtain a new algorithm by making simple modifications such as adding a new operator and adding a new splitting, while maintaining the multicore parallelism and other features. The package is available at https://github.com/uclaopt/TMAC

    Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano

    Full text link
    An abstract convergence theorem for a class of generalized descent methods that explicitly models relative errors is proved. The convergence theorem generalizes and unifies several recent abstract convergence theorems. It is applicable to possibly non-smooth and non-convex lower semi-continuous functions that satisfy the Kurdyka--Lojasiewicz (KL) inequality, which comprises a huge class of problems. Most of the recent algorithms that explicitly prove convergence using the KL inequality can cast into the abstract framework in this paper and, therefore, the generated sequence converges to a stationary point of the objective function. Additional flexibility compared to related approaches is gained by a descent property that is formulated with respect to a function that is allowed to change along the iterations, a generic distance measure, and an explicit/implicit relative error condition with respect to finite linear combinations of distance terms. As an application of the gained flexibility, the convergence of a block coordinate variable metric version of iPiano (an inertial forward--backward splitting algorithm) is proved, which performs favorably on an inpainting problem with a Mumford--Shah-like regularization from image processing

    A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems

    Full text link
    We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proximal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proximal mapping can be hard to compute so that standard first-order methods such as the proximal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-convex approximation method for solving this kind of problems. In this algorithm, we approximate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous {\em difference-of-convex} functions, we can then approximately minimize the approximation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proximal mapping of {\em each} nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconvex fused regularized optimization problems and simultaneously structured matrix optimization problems, and illustrate the performance numerically for these two specific applications

    Splitting methods with variable metric for KL functions

    Full text link
    We study the convergence of general abstract descent methods applied to a lower semicontinuous nonconvex function f that satisfies the Kurdyka-Lojasiewicz inequality in a Hilbert space. We prove that any precompact sequence converges to a critical point of f and obtain new convergence rates both for the values and the iterates. The analysis covers alternating versions of the forward-backward method with variable metric and relative errors. As an example, a nonsmooth and nonconvex version of the Levenberg-Marquardt algorithm is detailled

    Optimization of Inf-Convolution Regularized Nonconvex Composite Problems

    Full text link
    In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope. In a convex setting such problems can be solved via alternating minimization of a splitting formulation, where the consensus constraint is penalized with a Legendre function. In contrast, for nonconvex models it is in general unclear that this approach yields stationary points to the infimal convolution problem. To this end we analytically investigate local regularity properties of the Moreau-envelope function under prox-regularity, which allows us to establish the equivalence between stationary points of the splitting model and the original inf-convolution model. We apply our theory to characterize stationary points of the penalty objective, which is minimized by the elastic averaging SGD (EASGD) method for distributed training. Numerically, we demonstrate the practical relevance of the proposed approach on the important task of distributed training of deep neural networks.Comment: Accepted as a Conference Paper to International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, Nah

    Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

    Full text link
    In this paper, we consider a class of possibly nonconvex, nonsmooth and non-Lipschitz optimization problems arising in many contemporary applications such as machine learning, variable selection and image processing. To solve this class of problems, we propose a proximal gradient method with extrapolation and line search (PGels). This method is developed based on a special potential function and successfully incorporates both extrapolation and non-monotone line search, which are two simple and efficient accelerating techniques for the proximal gradient method. Thanks to the line search, this method allows more flexibilities in choosing the extrapolation parameters and updates them adaptively at each iteration if a certain line search criterion is not satisfied. Moreover, with proper choices of parameters, our PGels reduces to many existing algorithms. We also show that, under some mild conditions, our line search criterion is well defined and any cluster point of the sequence generated by PGels is a stationary point of our problem. In addition, by assuming the Kurdyka-{\L}ojasiewicz exponent of the objective in our problem, we further analyze the local convergence rate of two special cases of PGels, including the widely used non-monotone proximal gradient method as one case. Finally, we conduct some numerical experiments for solving the 1\ell_1 regularized logistic regression problem and the 1-2\ell_{1\text{-}2} regularized least squares problem. Our numerical results illustrate the efficiency of PGels and show the potential advantage of combining two accelerating techniques.Comment: This version addresses some typos in previous version and adds more comparison

    Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization

    Full text link
    In this paper, we consider the problem of minimizing the sum of nonconvex and possibly nonsmooth functions over a connected multi-agent network, where the agents have partial knowledge about the global cost function and can only access the zeroth-order information (i.e., the functional values) of their local cost functions. We propose and analyze a distributed primal-dual gradient-free algorithm for this challenging problem. We show that by appropriately choosing the parameters, the proposed algorithm converges to the set of first order stationary solutions with a provable global sublinear convergence rate. Numerical experiments demonstrate the effectiveness of our proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape

    An Optimization Framework with Flexible Inexact Inner Iterations for Nonconvex and Nonsmooth Programming

    Full text link
    In recent years, numerous vision and learning tasks have been (re)formulated as nonconvex and nonsmooth programmings(NNPs). Although some algorithms have been proposed for particular problems, designing fast and flexible optimization schemes with theoretical guarantee is a challenging task for general NNPs. It has been investigated that performing inexact inner iterations often benefit to special applications case by case, but their convergence behaviors are still unclear. Motivated by these practical experiences, this paper designs a novel algorithmic framework, named inexact proximal alternating direction method (IPAD) for solving general NNPs. We demonstrate that any numerical algorithms can be incorporated into IPAD for solving subproblems and the convergence of the resulting hybrid schemes can be consistently guaranteed by a series of simple error conditions. Beyond the guarantee in theory, numerical experiments on both synthesized and real-world data further demonstrate the superiority and flexibility of our IPAD framework for practical use

    The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems

    Full text link
    We introduce the Asynchronous PALM algorithm, a new extension of the Proximal Alternating Linearized Minimization (PALM) algorithm for solving nonsmooth, nonconvex optimization problems. Like the PALM algorithm, each step of the Asynchronous PALM algorithm updates a single block of coordinates; but unlike the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for sequential updates that occur one after the other. Instead, our new algorithm allows each of the coordinate blocks to be updated asynchronously and in any order, which means that any number of computing cores can compute updates in parallel without synchronizing their computations. In practice, this asynchronization strategy often leads to speedups that increase linearly with the number of computing cores. We introduce two variants of the Asynchronous PALM algorithm, one stochastic and one deterministic. In the stochastic \textit{and} deterministic cases, we show that cluster points of the algorithm are stationary points. In the deterministic case, we show that the algorithm converges globally whenever the Kurdyka-{\L}ojasiewicz property holds for a function closely related to the objective function, and we derive its convergence rate in a common special case. Finally, we provide a concrete case in which our assumptions hold
    corecore