12 research outputs found
On the use of hybrid coarse-level models in multilevel minimization methods
Solving large-scale nonlinear minimization problems is computationally
demanding. Nonlinear multilevel minimization (NMM) methods explore the
structure of the underlying minimization problem to solve such problems in a
computationally efficient and scalable manner. The efficiency of the NMM
methods relies on the quality of the coarse-level models. Traditionally,
coarse-level models are constructed using the additive approach, where the
so-called -correction enforces a local coherence between the fine-level
and coarse-level objective functions. In this work, we extend this methodology
and discuss how to enforce local coherence between the objective functions
using a multiplicative approach. Moreover, we also present a hybrid approach,
which takes advantage of both, additive and multiplicative, approaches. Using
numerical experiments from the field of deep learning, we show that employing a
hybrid approach can greatly improve the convergence speed of NMM methods and
therefore it provides an attractive alternative to the almost universally used
additive approach
A Multigrid Preconditioner for Jacobian-free Newton-Krylov Methods
In this work, we propose a multigrid preconditioner for Jacobian-free
Newton-Krylov (JFNK) methods. Our multigrid method does not require knowledge
of the Jacobian at any level of the multigrid hierarchy. As it is common in
standard multigrid methods, the proposed method also relies on three building
blocks: transfer operators, smoothers, and a coarse level solver. In addition
to the restriction and prolongation operator, we also use a projection operator
to transfer the current Newton iterate to a coarser level. The three-level
Chebyshev semi-iterative method is employed as a smoother, as it has good
smoothing properties and does not require the representation of the Jacobian
matrix. We replace the direct solver on the coarsest level with a matrix-free
Krylov subspace method, thus giving rise to a truly Jacobian-free multigrid
preconditioner. We will discuss all building blocks of our multigrid
preconditioner in detail and demonstrate the robustness and the efficiency of
the proposed method using several numerical examples
Nonlinear Schwarz preconditioning for nonlinear optimization problems with bound constraints
We propose a nonlinear additive Schwarz method for solving nonlinear
optimization problems with bound constraints. Our method is used as a
"right-preconditioner" for solving the first-order optimality system arising
within the sequential quadratic programming (SQP) framework using Newton's
method. The algorithmic scalability of this preconditioner is enhanced by
incorporating a solution-dependent coarse space, which takes into account the
restricted constraints from the fine level. By means of numerical examples, we
demonstrate that the proposed preconditioned Newton methods outperform standard
active-set methods considered in the literature
Multilevel Minimization for Deep Residual Networks
We present a new multilevel minimization framework for the training of deep
residual networks (ResNets), which has the potential to significantly reduce
training time and effort. Our framework is based on the dynamical system's
viewpoint, which formulates a ResNet as the discretization of an initial value
problem. The training process is then formulated as a time-dependent optimal
control problem, which we discretize using different time-discretization
parameters, eventually generating multilevel-hierarchy of auxiliary networks
with different resolutions. The training of the original ResNet is then
enhanced by training the auxiliary networks with reduced resolutions. By
design, our framework is conveniently independent of the choice of the training
strategy chosen on each level of the multilevel hierarchy. By means of
numerical examples, we analyze the convergence behavior of the proposed method
and demonstrate its robustness. For our examples we employ a multilevel
gradient-based methods. Comparisons with standard single level methods show a
speedup of more than factor three while achieving the same validation accuracy
Multilevel minimization for deep residual networks
We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy
Multilevel minimization in trust-region framework: algorithmic and software developments
The field of scientific computing is associated with the modeling of complex physical phenomena. The resulting numerical models are often described by differential equations, which, in many cases, can be related to non-convex minimization problems. Thus, after discretization, the solution of large-scale non-convex optimization problem is required. Various iterative solution strategies can be used to solve such optimization problems. However, the convergence speed of the majority of them deteriorates rapidly with increasing problem size. Multilevel methods are known to overcome this difficulty, and therefore we focus on a class of globally convergent multilevel solution strategies called the recursive multilevel trust-region (RMTR) method. The RMTR method combines globalization properties of trust-region method and the efficiency of multilevel methods. Despite its robustness and efficiency, the practical implementation of the RMTR method is a technically demanding task, which relies upon a suitable multilevel framework. This framework requires the careful design of two main components: i) multilevel hierarchy and transfer operators, ii) coarse-level models. To maximize the efficiency of the RMTR method, these components must be created with knowledge of the particular optimization problem in mind. In this thesis, we propose three novel variants of the RMTR method. Our first variant of the RMTR method is tailored for solving phase-field fracture problems. It employs novel coarse-level models, that allow the representation of fine-level fractures on the coarser levels. Our second RMTR variant is developed for thin-shell cloth simulations. Here, we employ a subdivision-based multilevel hierarchy and transfer operators. Our third variant of the RMTR method is designed for the training of the deep residual networks (ResNets). We construct the multilevel hierarchy and transfer operators by leveraging a dynamical system's view-point, which casts ResNet as the discretization of an initial value problem. We analyze the convergence properties of all three novel variants of the RMTR method. To this aim, we consider numerical examples from respective scientific fields. A comparison with a single-level trust-region method is made and demonstrates the efficiency of the proposed RMTR variants. Furthermore, we introduce our open-source library UTOPIA, which incorporates the parallel implementation of the multilevel methods presented in this work. Weak and strong scaling properties of our implementation are investigated up to 12,000 processors and a billion degrees of freedom
Multilevel minimization for deep residual networks
We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy
Multilevel minimization for deep residual networks
We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy