4,130 research outputs found
A Spectral Dai-Yuan-Type Conjugate Gradient Method for Unconstrained Optimization
A new spectral conjugate gradient method (SDYCG) is presented for solving unconstrained optimization problems in this paper. Our method provides a new expression of spectral parameter. This formula ensures that the sufficient descent condition holds. The search direction in the SDYCG can be viewed as a combination of the spectral gradient and the Dai-Yuan conjugate gradient. The global convergence of the SDYCG is also obtained. Numerical results show that the SDYCG may be capable of solving large-scale nonlinear unconstrained optimization problems
A Three-Term Conjugate Gradient Method with Sufficient Descent Property for Unconstrained Optimization
Conjugate gradient methods are widely used for solving large-scale unconstrained optimization problems, because they do not need the storage of matrices. In this paper, we propose a general form of three-term conjugate gradient methods which always generate a sufficient descent direction. We give a sufficient condition for the global convergence of the proposed general method. Moreover, we present a specific three-term conjugate gradient method based on the multi-step quasi-Newton method. Finally, some numerical results of the proposed method are given
Second order adjoints for solving PDE-constrained optimization problems
Inverse problems are of utmost importance in many fields of science and engineering. In the
variational approach inverse problems are formulated as PDE-constrained optimization problems,
where the optimal estimate of the uncertain parameters is the minimizer of a certain cost
functional subject to the constraints posed by the model equations. The numerical solution
of such optimization problems requires the computation of derivatives of the model output
with respect to model parameters. The first order derivatives of a cost functional (defined
on the model output) with respect to a large number of model parameters can be calculated
efficiently through first order adjoint sensitivity analysis. Second order adjoint models
give second derivative information in the form of matrix-vector products between the Hessian
of the cost functional and user defined vectors. Traditionally, the construction of second
order derivatives for large scale models has been considered too costly. Consequently, data
assimilation applications employ optimization algorithms that use only first order derivative
information, like nonlinear conjugate gradients and quasi-Newton methods.
In this paper we discuss the mathematical foundations of second order adjoint sensitivity
analysis and show that it provides an efficient approach to obtain Hessian-vector products. We
study the benefits of using of second order information in the numerical optimization process
for data assimilation applications. The numerical studies are performed in a twin experiment
setting with a two-dimensional shallow water model. Different scenarios are considered with
different discretization approaches, observation sets, and noise levels. Optimization algorithms
that employ second order derivatives are tested against widely used methods that require
only first order derivatives. Conclusions are drawn regarding the potential benefits and the
limitations of using high-order information in large scale data assimilation problems
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
Computation of Ground States of the Gross-Pitaevskii Functional via Riemannian Optimization
In this paper we combine concepts from Riemannian Optimization and the theory
of Sobolev gradients to derive a new conjugate gradient method for direct
minimization of the Gross-Pitaevskii energy functional with rotation. The
conservation of the number of particles constrains the minimizers to lie on a
manifold corresponding to the unit norm. The idea developed here is to
transform the original constrained optimization problem to an unconstrained
problem on this (spherical) Riemannian manifold, so that fast minimization
algorithms can be applied as alternatives to more standard constrained
formulations. First, we obtain Sobolev gradients using an equivalent definition
of an inner product which takes into account rotation. Then, the
Riemannian gradient (RG) steepest descent method is derived based on projected
gradients and retraction of an intermediate solution back to the constraint
manifold. Finally, we use the concept of the Riemannian vector transport to
propose a Riemannian conjugate gradient (RCG) method for this problem. It is
derived at the continuous level based on the "optimize-then-discretize"
paradigm instead of the usual "discretize-then-optimize" approach, as this
ensures robustness of the method when adaptive mesh refinement is performed in
computations. We evaluate various design choices inherent in the formulation of
the method and conclude with recommendations concerning selection of the best
options. Numerical tests demonstrate that the proposed RCG method outperforms
the simple gradient descent (RG) method in terms of rate of convergence. While
on simple problems a Newton-type method implemented in the {\tt Ipopt} library
exhibits a faster convergence than the (RCG) approach, the two methods perform
similarly on more complex problems requiring the use of mesh adaptation. At the
same time the (RCG) approach has far fewer tunable parameters.Comment: 28 pages, 13 figure
An optimal subgradient algorithm for large-scale convex optimization in simple domains
This paper shows that the optimal subgradient algorithm, OSGA, proposed in
\cite{NeuO} can be used for solving structured large-scale convex constrained
optimization problems. Only first-order information is required, and the
optimal complexity bounds for both smooth and nonsmooth problems are attained.
More specifically, we consider two classes of problems: (i) a convex objective
with a simple closed convex domain, where the orthogonal projection on this
feasible domain is efficiently available; (ii) a convex objective with a simple
convex functional constraint. If we equip OSGA with an appropriate
prox-function, the OSGA subproblem can be solved either in a closed form or by
a simple iterative scheme, which is especially important for large-scale
problems. We report numerical results for some applications to show the
efficiency of the proposed scheme. A software package implementing OSGA for
above domains is available
Do optimization methods in deep learning applications matter?
With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence
- …