4 research outputs found

    High-Resolution Modeling of the Fastest First-Order Optimization Method for Strongly Convex Functions

    Full text link
    Motivated by the fact that the gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs), here we derive an ODE representation of the accelerated triple momentum (TM) algorithm. For unconstrained optimization problems with strongly convex cost, the TM algorithm has a proven faster convergence rate than the Nesterov's accelerated gradient (NAG) method but with the same computational complexity. We show that similar to the NAG method to capture accurately the characteristics of the TM method, we need to use a high-resolution modeling to obtain the ODE representation of the TM algorithm. We use a Lyapunov analysis to investigate the stability and convergence behavior of the proposed high-resolution ODE representation of the TM algorithm. We show through this analysis that this ODE model has robustness to deviation from the parameters of the TM algorithm. We compare the rate of the ODE representation of the TM method with that of the NAG method to confirm its faster convergence. Our study also leads to a tighter bound on the worst rate of convergence for the ODE model of the NAG method. Lastly, we discuss the use of the integral quadratic constraint (IQC) method to establish an estimate on the rate of convergence of the TM algorithm. A numerical example demonstrates our results

    Distributed Optimization, Averaging via ADMM, and Network Topology

    Full text link
    There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms such as gradient descent the dependency of the convergence time with the topology of this network is well-known. However, for more involved algorithms such as the Alternating Direction Methods of Multipliers (ADMM) much less is known. At the heart of many distributed optimization algorithms there exists a gossip subroutine which averages local information over the network, and whose efficiency is crucial for the overall performance of the method. In this paper we review recent research in this area and, with the goal of isolating such a communication exchange behaviour, we compare different algorithms when applied to a canonical distributed averaging consensus problem. We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence and optimal parameter tuning in terms of spectral properties of the network. Finally, we empirically study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.Comment: to appear in "Proceedings of the IEEE

    Gradient flows and proximal splitting methods: a unified view on accelerated and stochastic optimization

    Full text link
    Optimization is at the heart of machine learning, statistics, and several applied scientific disciplines. Proximal algorithms form a class of methods that are broadly applicable and are particularly well-suited to nonsmooth, constrained, large-scale, and distributed optimization problems. There are essentially five proximal algorithms currently known, each proposed in seminal work: forward-backward splitting, Tseng splitting, Douglas-Rachford, alternating direction method of multipliers, and the more recent Davis-Yin. Such methods sit on a higher level of abstraction compared to gradient-based methods, having deep roots in nonlinear functional analysis. In this paper, we show that all of these algorithms can be derived as different discretizations of a single differential equation, namely the simple gradient flow which dates back to Cauchy (1847). An important aspect behind many of the success stories in machine learning relies on "accelerating" the convergence of first order methods. However, accelerated methods are notoriously difficult to analyze, counterintuitive, and without an underlying guiding principle. We show that by employing similar discretization schemes to Newton's classical equation of motion with an additional dissipative force, which we refer to as the accelerated gradient flow, allow us to obtain accelerated variants of all these proximal algorithms; the majority of which are new although some recover known cases in the literature. Moreover, we extend these algorithms to stochastic optimization settings, allowing us to make connections with Langevin and Fokker-Planck equations. Similar ideas apply to gradient descent, heavy ball, and Nesterov's method which are simpler. These results thus provide a unified framework from which several optimization methods can be derived from basic physical systems.Comment: the paper was reorganized; new additional materia

    The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems

    Full text link
    In this paper we propose new numerical algorithms in the setting of unconstrained optimization problems and we study the rate of convergence in the iterates of the objective function. Furthermore, our algorithms are based upon splitting and symplectic methods and they preserve the energy properties of the inherent continuous dynamical system that contains a Hessian perturbation. At the same time, we show that Nesterov gradient method is equivalent to a Lie-Trotter splitting applied to a Hessian driven damping system. Finally, some numerical experiments are presented in order to validate the theoretical results.Comment: 41 pages, 6 figures, 4 table
    corecore