4,499 research outputs found
Second order adjoints for solving PDE-constrained optimization problems
Inverse problems are of utmost importance in many fields of science and engineering. In the
variational approach inverse problems are formulated as PDE-constrained optimization problems,
where the optimal estimate of the uncertain parameters is the minimizer of a certain cost
functional subject to the constraints posed by the model equations. The numerical solution
of such optimization problems requires the computation of derivatives of the model output
with respect to model parameters. The first order derivatives of a cost functional (defined
on the model output) with respect to a large number of model parameters can be calculated
efficiently through first order adjoint sensitivity analysis. Second order adjoint models
give second derivative information in the form of matrix-vector products between the Hessian
of the cost functional and user defined vectors. Traditionally, the construction of second
order derivatives for large scale models has been considered too costly. Consequently, data
assimilation applications employ optimization algorithms that use only first order derivative
information, like nonlinear conjugate gradients and quasi-Newton methods.
In this paper we discuss the mathematical foundations of second order adjoint sensitivity
analysis and show that it provides an efficient approach to obtain Hessian-vector products. We
study the benefits of using of second order information in the numerical optimization process
for data assimilation applications. The numerical studies are performed in a twin experiment
setting with a two-dimensional shallow water model. Different scenarios are considered with
different discretization approaches, observation sets, and noise levels. Optimization algorithms
that employ second order derivatives are tested against widely used methods that require
only first order derivatives. Conclusions are drawn regarding the potential benefits and the
limitations of using high-order information in large scale data assimilation problems
Probabilistic Interpretation of Linear Solvers
This manuscript proposes a probabilistic framework for algorithms that
iteratively solve unconstrained linear problems with positive definite
for . The goal is to replace the point estimates returned by existing
methods with a Gaussian posterior belief over the elements of the inverse of
, which can be used to estimate errors. Recent probabilistic interpretations
of the secant family of quasi-Newton optimization algorithms are extended.
Combined with properties of the conjugate gradient algorithm, this leads to
uncertainty-calibrated methods with very limited cost overhead over conjugate
gradients, a self-contained novel interpretation of the quasi-Newton and
conjugate gradient algorithms, and a foundation for new nonlinear optimization
methods.Comment: final version, in press at SIAM J Optimizatio
The divergence of the BFGS and Gauss Newton Methods
We present examples of divergence for the BFGS and Gauss Newton methods.
These examples have objective functions with bounded level sets and other
properties concerning the examples published recently in this journal, like
unit steps and convexity along the search lines. As these other examples, the
iterates, function values and gradients in the new examples fit into the
general formulation in our previous work {\it On the divergence of line search
methods, Comput. Appl. Math. vol.26 no.1 (2007)}, which also presents an
example of divergence for Newton's method.Comment: This article was accepted by Mathematical programmin
Implicit particle methods and their connection with variational data assimilation
The implicit particle filter is a sequential Monte Carlo method for data
assimilation that guides the particles to the high-probability regions via a
sequence of steps that includes minimizations. We present a new and more
general derivation of this approach and extend the method to particle smoothing
as well as to data assimilation for perfect models. We show that the
minimizations required by implicit particle methods are similar to the ones one
encounters in variational data assimilation and explore the connection of
implicit particle methods with variational data assimilation. In particular, we
argue that existing variational codes can be converted into implicit particle
methods at a low cost, often yielding better estimates, that are also equipped
with quantitative measures of the uncertainty. A detailed example is presented
Do optimization methods in deep learning applications matter?
With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence
- …