    Approximate Newton Methods for Policy Search in Markov Decision Processes

    Approximate Newton methods are standard optimization tools which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, while alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov decision processes (MDPs). We first analyse the structure of the Hessian of the total expected reward, which is a standard objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton methods for MDPs. Like the Gauss- Newton method for non-linear least squares, these methods drop certain terms in the Hessian. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss- Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains

    An efficient Kullback-Leibler optimization algorithm for probabilistic control design

    This paper addresses the problem of iterative optimization of the Kullback-Leibler (KL) divergence on discrete (finite) probability spaces. Traditionally, the problem is formulated in the constrained optimization framework and is tackled by gradient like methods. Here, it is shown that performing the KL optimization in a Riemannian space equipped with the Fisher metric provides three major advantages over the standard methods: 1. The Fisher metric turns the original constrained optimization into an unconstrained optimization problem; 2. The optimization using a Fisher metric behaves asymptotically as a Newton method and shows very fast convergence near the optimum; 3. The Fisher metric is an intrinsic property of the space of probability distributions and allows a formally correct interpretation of a (natural) gradient as the steepest-descent method. Simulation results are presented

    An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for Linear Programming

    Powerful interior-point methods (IPM) based commercial solvers, such as Gurobi and Mosek, have been hugely successful in solving large-scale linear programming (LP) problems. The high efficiency of these solvers depends critically on the sparsity of the problem data and advanced matrix factorization techniques. For a large scale LP problem with data matrix AA that is dense (possibly structured) or whose corresponding normal matrix AATAA^T has a dense Cholesky factor (even with re-ordering), these solvers may require excessive computational cost and/or extremely heavy memory usage in each interior-point iteration. Unfortunately, the natural remedy, i.e., the use of iterative methods based IPM solvers, although can avoid the explicit computation of the coefficient matrix and its factorization, is not practically viable due to the inherent extreme ill-conditioning of the large scale normal equation arising in each interior-point iteration. To provide a better alternative choice for solving large scale LPs with dense data or requiring expensive factorization of its normal equation, we propose a semismooth Newton based inexact proximal augmented Lagrangian ({\sc Snipal}) method. Different from classical IPMs, in each iteration of {\sc Snipal}, iterative methods can efficiently be used to solve simpler yet better conditioned semismooth Newton linear systems. Moreover, {\sc Snipal} not only enjoys a fast asymptotic superlinear convergence but is also proven to enjoy a finite termination property. Numerical comparisons with Gurobi have demonstrated encouraging potential of {\sc Snipal} for handling large-scale LP problems where the constraint matrix AA has a dense representation or AATAA^T has a dense factorization even with an appropriate re-ordering.Comment: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil

    Orthogonal Extended Infomax Algorithm

    The extended infomax algorithm for independent component analysis (ICA) can separate sub- and super-Gaussian signals but converges slowly as it uses stochastic gradient optimization. In this paper, an improved extended infomax algorithm is presented that converges much faster. Accelerated convergence is achieved by replacing the natural gradient learning rule of extended infomax by a fully-multiplicative orthogonal-group based update scheme of the unmixing matrix leading to an orthogonal extended infomax algorithm (OgExtInf). Computational performance of OgExtInf is compared with two fast ICA algorithms: the popular FastICA and Picard, a L-BFGS algorithm belonging to the family of quasi-Newton methods. Our results demonstrate superior performance of the proposed method on small-size EEG data sets as used for example in online EEG processing systems, such as brain-computer interfaces or clinical systems for spike and seizure detection.Comment: 17 pages, 6 figure

    Small steps and giant leaps: Minimal Newton solvers for Deep Learning

    We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available

    Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

    Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.Comment: V3: Minor corrections (typographic errors

    Fast finite difference solvers for singular solutions of the elliptic Monge-Amp\`ere equation

    The elliptic Monge-Ampere equation is a fully nonlinear Partial Differential Equation which originated in geometric surface theory, and has been applied in dynamic meteorology, elasticity, geometric optics, image processing and image registration. Solutions can be singular, in which case standard numerical approaches fail. In this article we build a finite difference solver for the Monge-Ampere equation, which converges even for singular solutions. Regularity results are used to select a priori between a stable, provably convergent monotone discretization and an accurate finite difference discretization in different regions of the computational domain. This allows singular solutions to be computed using a stable method, and regular solutions to be computed more accurately. The resulting nonlinear equations are then solved by Newton's method. Computational results in two and three dimensions validate the claims of accuracy and solution speed. A computational example is presented which demonstrates the necessity of the use of the monotone scheme near singularities.Comment: 23 pages, 4 figures, 4 tables; added arxiv links to references, added coment