105,588 research outputs found

    Preconditioned Primal-Dual Gradient Methods for Nonconvex Composite and Finite-Sum Optimization

    Full text link
    In this paper, we first introduce a preconditioned primal-dual gradient algorithm based on conjugate duality theory. This algorithm is designed to solve composite optimization problem whose objective function consists of two summands: a continuously differentiable nonconvex function and the composition of a nonsmooth nonconvex function with a linear operator. In contrast to existing nonconvex primal-dual algorithms, our proposed algorithm, through the utilization of conjugate duality, does not require the calculation of proximal mapping of nonconvex functions. Under mild conditions, we prove that any cluster point of the generated sequence is a critical point of the composite optimization problem. In the context of Kurdyka-\L{}ojasiewicz property, we establish global convergence and convergence rates for the iterates. Secondly, for nonconvex finite-sum optimization, we propose a stochastic algorithm that combines the preconditioned primal-dual gradient algorithm with a class of variance reduced stochastic gradient estimators. Almost sure global convergence and expected convergence rates are derived relying on the Kurdyka-\L{}ojasiewicz inequality. Finally, some preliminary numerical results are presented to demonstrate the effectiveness of the proposed algorithms

    Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

    Full text link
    The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.Comment: 40 pages, 4 figure

    Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis

    Full text link
    Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent

    Dynamical convergence analysis for nonconvex linearized proximal ADMM algorithms

    Full text link
    The convergence analysis of optimization algorithms using continuous-time dynamical systems has received much attention in recent years. In this paper, we investigate applications of these systems to analyze the convergence of linearized proximal ADMM algorithms for nonconvex composite optimization, whose objective function is the sum of a continuously differentiable function and a composition of a possibly nonconvex function with a linear operator. We first derive a first-order differential inclusion for the linearized proximal ADMM algorithm, LP-ADMM. Both the global convergence and the convergence rates of the generated trajectory are established with the use of Kurdyka-\L{}ojasiewicz (KL) property. Then, a stochastic variant, LP-SADMM, is delved into an investigation for finite-sum nonconvex composite problems. Under mild conditions, we obtain the stochastic differential equation corresponding to LP-SADMM, and demonstrate the almost sure global convergence of the generated trajectory by leveraging the KL property. Based on the almost sure convergence of trajectory, we construct a stochastic process that converges almost surely to an approximate critical point of objective function, and derive the expected convergence rates associated with this stochastic process. Moreover, we propose an accelerated LP-SADMM that incorporates Nesterov's acceleration technique. The continuous-time dynamical system of this algorithm is modeled as a second-order stochastic differential equation. Within the context of KL property, we explore the related almost sure convergence and expected convergence rates
    • …
    corecore