4 research outputs found
High-Resolution Modeling of the Fastest First-Order Optimization Method for Strongly Convex Functions
Motivated by the fact that the gradient-based optimization algorithms can be
studied from the perspective of limiting ordinary differential equations
(ODEs), here we derive an ODE representation of the accelerated triple momentum
(TM) algorithm. For unconstrained optimization problems with strongly convex
cost, the TM algorithm has a proven faster convergence rate than the Nesterov's
accelerated gradient (NAG) method but with the same computational complexity.
We show that similar to the NAG method to capture accurately the
characteristics of the TM method, we need to use a high-resolution modeling to
obtain the ODE representation of the TM algorithm. We use a Lyapunov analysis
to investigate the stability and convergence behavior of the proposed
high-resolution ODE representation of the TM algorithm. We show through this
analysis that this ODE model has robustness to deviation from the parameters of
the TM algorithm. We compare the rate of the ODE representation of the TM
method with that of the NAG method to confirm its faster convergence. Our study
also leads to a tighter bound on the worst rate of convergence for the ODE
model of the NAG method. Lastly, we discuss the use of the integral quadratic
constraint (IQC) method to establish an estimate on the rate of convergence of
the TM algorithm. A numerical example demonstrates our results
Distributed Optimization, Averaging via ADMM, and Network Topology
There has been an increasing necessity for scalable optimization methods,
especially due to the explosion in the size of datasets and model complexity in
modern machine learning applications. Scalable solvers often distribute the
computation over a network of processing units. For simple algorithms such as
gradient descent the dependency of the convergence time with the topology of
this network is well-known. However, for more involved algorithms such as the
Alternating Direction Methods of Multipliers (ADMM) much less is known. At the
heart of many distributed optimization algorithms there exists a gossip
subroutine which averages local information over the network, and whose
efficiency is crucial for the overall performance of the method. In this paper
we review recent research in this area and, with the goal of isolating such a
communication exchange behaviour, we compare different algorithms when applied
to a canonical distributed averaging consensus problem. We also show
interesting connections between ADMM and lifted Markov chains besides providing
an explicitly characterization of its convergence and optimal parameter tuning
in terms of spectral properties of the network. Finally, we empirically study
the connection between network topology and convergence rates for different
algorithms on a real world problem of sensor localization.Comment: to appear in "Proceedings of the IEEE
Gradient flows and proximal splitting methods: a unified view on accelerated and stochastic optimization
Optimization is at the heart of machine learning, statistics, and several
applied scientific disciplines. Proximal algorithms form a class of methods
that are broadly applicable and are particularly well-suited to nonsmooth,
constrained, large-scale, and distributed optimization problems. There are
essentially five proximal algorithms currently known, each proposed in seminal
work: forward-backward splitting, Tseng splitting, Douglas-Rachford,
alternating direction method of multipliers, and the more recent Davis-Yin.
Such methods sit on a higher level of abstraction compared to gradient-based
methods, having deep roots in nonlinear functional analysis. In this paper, we
show that all of these algorithms can be derived as different discretizations
of a single differential equation, namely the simple gradient flow which dates
back to Cauchy (1847). An important aspect behind many of the success stories
in machine learning relies on "accelerating" the convergence of first order
methods. However, accelerated methods are notoriously difficult to analyze,
counterintuitive, and without an underlying guiding principle. We show that by
employing similar discretization schemes to Newton's classical equation of
motion with an additional dissipative force, which we refer to as the
accelerated gradient flow, allow us to obtain accelerated variants of all these
proximal algorithms; the majority of which are new although some recover known
cases in the literature. Moreover, we extend these algorithms to stochastic
optimization settings, allowing us to make connections with Langevin and
Fokker-Planck equations. Similar ideas apply to gradient descent, heavy ball,
and Nesterov's method which are simpler. These results thus provide a unified
framework from which several optimization methods can be derived from basic
physical systems.Comment: the paper was reorganized; new additional materia
The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems
In this paper we propose new numerical algorithms in the setting of
unconstrained optimization problems and we study the rate of convergence in the
iterates of the objective function. Furthermore, our algorithms are based upon
splitting and symplectic methods and they preserve the energy properties of the
inherent continuous dynamical system that contains a Hessian perturbation. At
the same time, we show that Nesterov gradient method is equivalent to a
Lie-Trotter splitting applied to a Hessian driven damping system. Finally, some
numerical experiments are presented in order to validate the theoretical
results.Comment: 41 pages, 6 figures, 4 table