Nonlinear Preconditioning Methods for Optimization and Parallel-In-Time Methods for 1D Scalar Hyperbolic Partial Differential Equations

Abstract

This thesis consists of two main parts, part one addressing problems from nonlinear optimization and part two based on solving systems of time dependent differential equations, with both parts describing strategies for accelerating the convergence of iterative methods. In part one we present a nonlinear preconditioning framework for use with nonlinear solvers applied to nonlinear optimization problems, motivated by a generalization of linear left preconditioning and linear preconditioning via a change of variables for minimizing quadratic objective functions. In the optimization context nonlinear preconditioning is used to generate a preconditioner direction that either replaces or supplements the gradient vector throughout the optimization algorithm. This framework is used to discuss previously developed nonlinearly preconditioned nonlinear GMRES and nonlinear conjugate gradients (NCG) algorithms, as well as to develop two new nonlinearly preconditioned quasi-Newton methods based on the limited memory Broyden and limited memory BFGS (L-BFGS) updates. We show how all of the above methods can be implemented in a manifold optimization context, with a particular emphasis on Grassmann matrix manifolds. These methods are compared by solving the optimization problems defining the canonical polyadic (CP) decomposition and Tucker higher order singular value decomposition (HOSVD) for tensors, which are formulated as minimizing approximation error in the Frobenius norm. Both of these decompositions have alternating least squares (ALS) type fixed point iterations derived from their optimization problem definitions. While these ALS type iterations may be slow to converge in practice, they can serve as efficient nonlinear preconditioners for the other optimization methods. As the Tucker HOSVD problem involves orthonormality constraints and lacks unique minimizers, the optimization algorithms are extended from Euclidean space to the manifold setting, where optimization on Grassmann manifolds can resolve both of the issues present in the HOSVD problem. The nonlinearly preconditioned methods are compared to the ALS type preconditioners and non-preconditioned NCG, L-BFGS, and a trust region algorithm using both synthetic and real life tensor data with varying noise level, the real data arising from applications in computer vision and handwritten digit recognition. Numerical results show that the nonlinearly preconditioned methods offer substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large tensors, in cases where there are significant amounts of noise in the data, and when high accuracy results are required. In part two we apply a multigrid reduction-in-time (MGRIT) algorithm to scalar one-dimensional hyperbolic partial differential equations. This study is motivated by the observation that sequential time-stepping is an obvious computational bottleneck when attempting to implement highly concurrent algorithms, thus parallel-in-time methods are particularly desirable. Existing parallel-in-time methods have produced significant speedups for parabolic or sufficiently diffusive problems, but can have stability and convergence issues for hyperbolic or advection dominated problems. Being a multigrid method, MGRIT primarily uses temporal coarsening, but spatial coarsening can also be incorporated to produce cheaper multigrid cycles and to ensure stability conditions are satisfied on all levels for explicit time-stepping methods. We compare convergence results for the linear advection and diffusion equations, which illustrate the increased difficulty associated with solving hyperbolic problems via parallel-in-time methods. A particular issue that we address is the fact that uniform factor-two spatial coarsening may negatively affect the convergence rate for MGRIT, resulting in extremely slow convergence when the wave speed is near zero, even if only locally. This is due to a sort of anisotropy in the nodal connections, with small wave speeds resulting in spatial connections being weaker than temporal connections. Through the use of semi-algebraic mode analysis applied to the combined advection-diffusion equation we illustrate how the norm of the iteration matrix, and hence an upper bound on the rate of convergence, varies for different choices of wave speed, diffusivity coefficient, space-time grid spacing, and the inclusion or exclusion of spatial coarsening. The use of waveform relaxation multigrid on intermediate, temporally semi-coarsened grids is identified as a potential remedy for the issues introduced by spatial coarsening, with the downside of creating a more intrusive algorithm that cannot be easily combined with existing time-stepping routines for different problems. As a second, less intrusive, alternative we present an adaptive spatial coarsening strategy that prevents the slowdown observed for small local wave speeds, which is applicable for solving the variable coefficient linear advection equation and the inviscid Burgers equation using first-order explicit or implicit time-stepping methods. Serial numerical results show this method offers significant improvements over uniform coarsening and is convergent for inviscid Burgers' equation with and without shocks. Parallel scaling tests indicate that improvements over serial time-stepping strategies are possible when spatial parallelism alone saturates, and that scalability is robust for oscillatory solutions that change on the scale of the grid spacing

    Similar works