105,588 research outputs found
Preconditioned Primal-Dual Gradient Methods for Nonconvex Composite and Finite-Sum Optimization
In this paper, we first introduce a preconditioned primal-dual gradient
algorithm based on conjugate duality theory. This algorithm is designed to
solve composite optimization problem whose objective function consists of two
summands: a continuously differentiable nonconvex function and the composition
of a nonsmooth nonconvex function with a linear operator. In contrast to
existing nonconvex primal-dual algorithms, our proposed algorithm, through the
utilization of conjugate duality, does not require the calculation of proximal
mapping of nonconvex functions. Under mild conditions, we prove that any
cluster point of the generated sequence is a critical point of the composite
optimization problem. In the context of Kurdyka-\L{}ojasiewicz property, we
establish global convergence and convergence rates for the iterates. Secondly,
for nonconvex finite-sum optimization, we propose a stochastic algorithm that
combines the preconditioned primal-dual gradient algorithm with a class of
variance reduced stochastic gradient estimators. Almost sure global convergence
and expected convergence rates are derived relying on the
Kurdyka-\L{}ojasiewicz inequality. Finally, some preliminary numerical results
are presented to demonstrate the effectiveness of the proposed algorithms
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
The goal of decentralized optimization over a network is to optimize a global
objective formed by a sum of local (possibly nonsmooth) convex functions using
only local computation and communication. It arises in various application
domains, including distributed tracking and localization, multi-agent
co-ordination, estimation in sensor networks, and large-scale optimization in
machine learning. We develop and analyze distributed algorithms based on dual
averaging of subgradients, and we provide sharp bounds on their convergence
rates as a function of the network size and topology. Our method of analysis
allows for a clear separation between the convergence of the optimization
algorithm itself and the effects of communication constraints arising from the
network structure. In particular, we show that the number of iterations
required by our algorithm scales inversely in the spectral gap of the network.
The sharpness of this prediction is confirmed both by theoretical lower bounds
and simulations for various networks. Our approach includes both the cases of
deterministic optimization and communication, as well as problems with
stochastic optimization and/or communication.Comment: 40 pages, 4 figure
Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis
Recently several methods were proposed for sparse optimization which make
careful use of second-order information [10, 28, 16, 3] to improve local
convergence rates. These methods construct a composite quadratic approximation
using Hessian information, optimize this approximation using a first-order
method, such as coordinate descent and employ a line search to ensure
sufficient descent. Here we propose a general framework, which includes
slightly modified versions of existing algorithms and also a new algorithm,
which uses limited memory BFGS Hessian approximations, and provide a novel
global convergence rate analysis, which covers methods that solve subproblems
via coordinate descent
Dynamical convergence analysis for nonconvex linearized proximal ADMM algorithms
The convergence analysis of optimization algorithms using continuous-time
dynamical systems has received much attention in recent years. In this paper,
we investigate applications of these systems to analyze the convergence of
linearized proximal ADMM algorithms for nonconvex composite optimization, whose
objective function is the sum of a continuously differentiable function and a
composition of a possibly nonconvex function with a linear operator. We first
derive a first-order differential inclusion for the linearized proximal ADMM
algorithm, LP-ADMM. Both the global convergence and the convergence rates of
the generated trajectory are established with the use of Kurdyka-\L{}ojasiewicz
(KL) property. Then, a stochastic variant, LP-SADMM, is delved into an
investigation for finite-sum nonconvex composite problems. Under mild
conditions, we obtain the stochastic differential equation corresponding to
LP-SADMM, and demonstrate the almost sure global convergence of the generated
trajectory by leveraging the KL property. Based on the almost sure convergence
of trajectory, we construct a stochastic process that converges almost surely
to an approximate critical point of objective function, and derive the expected
convergence rates associated with this stochastic process. Moreover, we propose
an accelerated LP-SADMM that incorporates Nesterov's acceleration technique.
The continuous-time dynamical system of this algorithm is modeled as a
second-order stochastic differential equation. Within the context of KL
property, we explore the related almost sure convergence and expected
convergence rates
- …