32,537 research outputs found
Generalized Framework for Nonlinear Acceleration
Nonlinear acceleration algorithms improve the performance of iterative
methods, such as gradient descent, using the information contained in past
iterates. However, their efficiency is still not entirely understood even in
the quadratic case. In this paper, we clarify the convergence analysis by
giving general properties that share several classes of nonlinear acceleration:
Anderson acceleration (and variants), quasi-Newton methods (such as Broyden
Type-I or Type-II, SR1, DFP, and BFGS) and Krylov methods (Conjugate Gradient,
MINRES, GMRES). In particular, we propose a generic family of algorithms that
contains all the previous methods and prove its optimal rate of convergence
when minimizing quadratic functions. We also propose multi-secants updates for
the quasi-Newton methods listed above. We provide a Matlab code implementing
the algorithm
A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization
Anderson acceleration (or Anderson mixing) is an efficient acceleration
method for fixed point iterations , e.g., gradient descent can
be viewed as iteratively applying the operation . It is known that Anderson acceleration is quite efficient in practice
and can be viewed as an extension of Krylov subspace methods for nonlinear
problems. In this paper, we show that Anderson acceleration with Chebyshev
polynomial can achieve the optimal convergence rate
, which improves the previous result
provided by (Toth and Kelley, 2015) for
quadratic functions. Moreover, we provide a convergence analysis for minimizing
general nonlinear problems. Besides, if the hyperparameters (e.g., the
Lipschitz smooth parameter ) are not available, we propose a guessing
algorithm for guessing them dynamically and also prove a similar convergence
rate. Finally, the experimental results demonstrate that the proposed
Anderson-Chebyshev acceleration method converges significantly faster than
other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated
GD. Also, these algorithms combined with the proposed guessing algorithm
(guessing the hyperparameters dynamically) achieve much better performance.Comment: To appear in AISTATS 202
Composite Optimization by Nonconvex Majorization-Minimization
The minimization of a nonconvex composite function can model a variety of
imaging tasks. A popular class of algorithms for solving such problems are
majorization-minimization techniques which iteratively approximate the
composite nonconvex function by a majorizing function that is easy to minimize.
Most techniques, e.g. gradient descent, utilize convex majorizers in order to
guarantee that the majorizer is easy to minimize. In our work we consider a
natural class of nonconvex majorizers for these functions, and show that these
majorizers are still sufficient for a globally convergent optimization scheme.
Numerical results illustrate that by applying this scheme, one can often obtain
superior local optima compared to previous majorization-minimization methods,
when the nonconvex majorizers are solved to global optimality. Finally, we
illustrate the behavior of our algorithm for depth super-resolution from raw
time-of-flight data.Comment: 38 pages, 12 figures, accepted for publication in SIIM
Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision
Many computer vision problems (e.g., camera calibration, image alignment,
structure from motion) are solved with nonlinear optimization methods. It is
generally accepted that second order descent methods are the most robust, fast,
and reliable approaches for nonlinear optimization of a general smooth
function. However, in the context of computer vision, second order descent
methods have two main drawbacks: (1) the function might not be analytically
differentiable and numerical approximations are impractical, and (2) the
Hessian may be large and not positive definite. To address these issues, this
paper proposes generic descent maps, which are average "descent directions" and
rescaling factors learned in a supervised fashion. Using generic descent maps,
we derive a practical algorithm - Supervised Descent Method (SDM) - for
minimizing Nonlinear Least Squares (NLS) problems. During training, SDM learns
a sequence of decent maps that minimize the NLS. In testing, SDM minimizes the
NLS objective using the learned descent maps without computing the Jacobian or
the Hessian. We prove the conditions under which the SDM is guaranteed to
converge. We illustrate the effectiveness and accuracy of SDM in three computer
vision problems: rigid image alignment, non-rigid image alignment, and 3D pose
estimation. In particular, we show how SDM achieves state-of-the-art
performance in the problem of facial feature detection. The code has been made
available at www.humansensing.cs.cmu.edu/intraface.Comment: 15 pages. In submission to TPAM
Gauss-Newton Optimization for Phase Recovery from the Bispectrum
Phase recovery from the bispectrum is a central problem in speckle
interferometry which can be posed as an optimization problem minimizing a
weighted nonlinear least-squares objective function. We look at two different
formulations of the phase recovery problem from the literature, both of which
can be minimized with respect to either the recovered phase or the recovered
image. Previously, strategies for solving these formulations have been limited
to first-order optimization methods such as gradient descent or quasi-Newton
methods. This paper explores Gauss-Newton optimization schemes for the problem
of phase recovery from the bispectrum. We implement efficient Gauss-Newton
optimization schemes for all the formulations. For the two of these
formulations which optimize with respect to the recovered image, we also extend
to projected Gauss-Newton to enforce element-wise lower and upper bounds on the
pixel intensities of the recovered image. We show that our efficient
Gauss-Newton schemes result in better image reconstructions with no or limited
additional computational cost compared to previously implemented first-order
optimization schemes for phase recovery from the bispectrum. MATLAB
implementations of all methods and simulations are made publicly available in
the BiBox repository on Github.Comment: 13 pages, 4 figures, 2 table
The proximal point method revisited
In this short survey, I revisit the role of the proximal point method in
large scale optimization. I focus on three recent examples: a proximally guided
subgradient method for weakly convex stochastic approximation, the prox-linear
algorithm for minimizing compositions of convex functions and smooth maps, and
Catalyst generic acceleration for regularized Empirical Risk Minimization.Comment: 11 pages, submitted to SIAG/OPT Views and New
Optimizing Schroedinger functionals using Sobolev gradients: Applications to Quantum Mechanics and Nonlinear Optics
In this paper we study the application of the Sobolev gradients technique to
the problem of minimizing several Schr\"odinger functionals related to timely
and difficult nonlinear problems in Quantum Mechanics and Nonlinear Optics. We
show that these gradients act as preconditioners over traditional choices of
descent directions in minimization methods and show a computationally
inexpensive way to obtain them using a discrete Fourier basis and a Fast
Fourier Transform. We show that the Sobolev preconditioning provides a great
convergence improvement over traditional techniques for finding solutions with
minimal energy as well as stationary states and suggest a generalization of the
method using arbitrary linear operators.Comment: 19 pages with 5 postscript figure
TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods
TMAC is a toolbox written in C++11 that implements algorithms based on a set
of modern methods for large-scale optimization. It covers a variety of
optimization problems, which can be both smooth and nonsmooth, convex and
nonconvex, as well as constrained and unconstrained. The algorithms implemented
in TMAC, such as the coordinate up- date method and operator splitting method,
are scalable as they decompose a problem into simple subproblems. These
algorithms can run in a multi-threaded fashion, either synchronously or
asynchronously, to take advantages of all the cores available. TMAC
architecture mimics how a scientist writes down an optimization algorithm.
Therefore, it is easy for one to obtain a new algorithm by making simple
modifications such as adding a new operator and adding a new splitting, while
maintaining the multicore parallelism and other features. The package is
available at https://github.com/uclaopt/TMAC
Smoothing gradients in iterative regularization
Connected with the rise of interest in inverse problems is the development
and analysis of regularization methods, which are a necessity due to the
ill-posedness of inverse problems. Tikhonov-type regularization methods are
very popular in this regard. However, its direct implementation for large-scale
linear or non-linear problems is a non-trivial task. In such scenarios,
iterative regularization methods usually serve as a better alternative. In this
paper we propose a new iterative regularization method which uses descent
directions, different from the usual gradient direction, that enable a more
smoother and effective recovery than the later. This is achieved by
transforming the original noisy gradient, via a smoothing operator, to a
smoother gradient, which is more robust to the noise present in the data. It is
also shown that this technique is very beneficial when dealing with data having
large noise level. To illustrate the computational efficiency of this method we
apply it to numerically solve some classical integral inverse problems,
including image deblurring and tomography problems, and compare the results
with certain standard regularization methods, such as Tikhonov, TV, CGLS, etc.Comment: Comments are welcomed. arXiv admin note: text overlap with
arXiv:1906.0547
Detecting and correcting the loss of independence in nonlinear conjugate gradient
It is well known that search directions in nonlinear conjugate gradient (CG)
can sometimes become nearly dependent, causing a dramatic slow-down in the
convergence rate. We provide a theoretical analysis of this loss of
independence. The analysis applies to the case of a strictly convex objective
function and is motivated by older work of Nemirovsky and Yudin. Loss of
independence can affect several of the well-known variants of nonlinear CG
including Fletcher-Reeves, Polak-Ribi\`ere (nonnegative variant), and
Hager-Zhang.
Based on our analysis, we propose a relatively inexpensive computational test
for detecting loss of independence. We also propose a method for correcting it
when it is detected, which we call "subspace optimization." Although the
correction method is somewhat expensive, our experiments show that in some
cases, usually the most ill-conditioned ones, it yields a method much faster
than any of these three variants. Even though our theory covers only strongly
convex objective functions, we provide computational results to indicate that
the detection and correction mechanisms may also hold promise for nonconvex
optimization
- …