168,945 research outputs found
Approximate Newton Methods for Policy Search in Markov Decision Processes
Approximate Newton methods are standard optimization tools which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, while alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov decision processes (MDPs). We first analyse the structure of the Hessian of the total expected reward, which is a standard objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton methods for MDPs. Like the Gauss- Newton method for non-linear least squares, these methods drop certain terms in the Hessian. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss- Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains
An efficient Kullback-Leibler optimization algorithm for probabilistic control design
This paper addresses the problem of iterative optimization of the Kullback-Leibler (KL) divergence on discrete (finite) probability spaces. Traditionally, the problem is formulated in the constrained optimization framework and is tackled by gradient like methods. Here, it is shown that performing the KL optimization in a Riemannian space equipped with the Fisher metric provides three major advantages over the standard methods: 1. The Fisher metric turns the original constrained optimization into an unconstrained optimization problem; 2. The optimization using a Fisher metric behaves asymptotically as a Newton method and shows very fast convergence near the optimum; 3. The Fisher metric is an intrinsic property of the space of probability distributions and allows a formally correct interpretation of a (natural) gradient as the steepest-descent method. Simulation results are presented
An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for Linear Programming
Powerful interior-point methods (IPM) based commercial solvers, such as
Gurobi and Mosek, have been hugely successful in solving large-scale linear
programming (LP) problems. The high efficiency of these solvers depends
critically on the sparsity of the problem data and advanced matrix
factorization techniques. For a large scale LP problem with data matrix
that is dense (possibly structured) or whose corresponding normal matrix
has a dense Cholesky factor (even with re-ordering), these solvers may require
excessive computational cost and/or extremely heavy memory usage in each
interior-point iteration. Unfortunately, the natural remedy, i.e., the use of
iterative methods based IPM solvers, although can avoid the explicit
computation of the coefficient matrix and its factorization, is not practically
viable due to the inherent extreme ill-conditioning of the large scale normal
equation arising in each interior-point iteration. To provide a better
alternative choice for solving large scale LPs with dense data or requiring
expensive factorization of its normal equation, we propose a semismooth Newton
based inexact proximal augmented Lagrangian ({\sc Snipal}) method. Different
from classical IPMs, in each iteration of {\sc Snipal}, iterative methods can
efficiently be used to solve simpler yet better conditioned semismooth Newton
linear systems. Moreover, {\sc Snipal} not only enjoys a fast asymptotic
superlinear convergence but is also proven to enjoy a finite termination
property. Numerical comparisons with Gurobi have demonstrated encouraging
potential of {\sc Snipal} for handling large-scale LP problems where the
constraint matrix has a dense representation or has a dense
factorization even with an appropriate re-ordering.Comment: Due to the limitation "The abstract field cannot be longer than 1,920
characters", the abstract appearing here is slightly shorter than that in the
PDF fil
Orthogonal Extended Infomax Algorithm
The extended infomax algorithm for independent component analysis (ICA) can
separate sub- and super-Gaussian signals but converges slowly as it uses
stochastic gradient optimization. In this paper, an improved extended infomax
algorithm is presented that converges much faster. Accelerated convergence is
achieved by replacing the natural gradient learning rule of extended infomax by
a fully-multiplicative orthogonal-group based update scheme of the unmixing
matrix leading to an orthogonal extended infomax algorithm (OgExtInf).
Computational performance of OgExtInf is compared with two fast ICA algorithms:
the popular FastICA and Picard, a L-BFGS algorithm belonging to the family of
quasi-Newton methods. Our results demonstrate superior performance of the
proposed method on small-size EEG data sets as used for example in online EEG
processing systems, such as brain-computer interfaces or clinical systems for
spike and seizure detection.Comment: 17 pages, 6 figure
Small steps and giant leaps: Minimal Newton solvers for Deep Learning
We propose a fast second-order method that can be used as a drop-in
replacement for current deep learning solvers. Compared to stochastic gradient
descent (SGD), it only requires two additional forward-mode automatic
differentiation operations per iteration, which has a computational cost
comparable to two standard forward passes and is easy to implement. Our method
addresses long-standing issues with current second-order solvers, which invert
an approximate Hessian matrix every iteration exactly or by conjugate-gradient
methods, a procedure that is both costly and sensitive to noise. Instead, we
propose to keep a single estimate of the gradient projected by the inverse
Hessian matrix, and update it once per iteration. This estimate has the same
size and is similar to the momentum variable that is commonly used in SGD. No
estimate of the Hessian is maintained. We first validate our method, called
CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock
function and degenerate 2-layer linear networks), where current deep learning
solvers seem to struggle. We then train several large models on CIFAR and
ImageNet, including ResNet and VGG-f networks, where we demonstrate faster
convergence with no hyperparameter tuning. Code is available
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Natural gradient descent, which preconditions a gradient descent update with
the Fisher information matrix of the underlying statistical model, is a way to
capture partial second-order information. Several highly visible works have
advocated an approximation known as the empirical Fisher, drawing connections
between approximate second-order methods and heuristics like Adam. We dispute
this argument by showing that the empirical Fisher---unlike the Fisher---does
not generally capture second-order information. We further argue that the
conditions under which the empirical Fisher approaches the Fisher (and the
Hessian) are unlikely to be met in practice, and that, even on simple
optimization problems, the pathologies of the empirical Fisher can have
undesirable effects.Comment: V3: Minor corrections (typographic errors
Fast finite difference solvers for singular solutions of the elliptic Monge-Amp\`ere equation
The elliptic Monge-Ampere equation is a fully nonlinear Partial Differential
Equation which originated in geometric surface theory, and has been applied in
dynamic meteorology, elasticity, geometric optics, image processing and image
registration. Solutions can be singular, in which case standard numerical
approaches fail. In this article we build a finite difference solver for the
Monge-Ampere equation, which converges even for singular solutions. Regularity
results are used to select a priori between a stable, provably convergent
monotone discretization and an accurate finite difference discretization in
different regions of the computational domain. This allows singular solutions
to be computed using a stable method, and regular solutions to be computed more
accurately. The resulting nonlinear equations are then solved by Newton's
method. Computational results in two and three dimensions validate the claims
of accuracy and solution speed. A computational example is presented which
demonstrates the necessity of the use of the monotone scheme near
singularities.Comment: 23 pages, 4 figures, 4 tables; added arxiv links to references, added
coment
- …