Search CORE

12 research outputs found

On the use of hybrid coarse-level models in multilevel minimization methods

Author: Kopaničáková Alena
Publication venue
Publication date: 28/11/2022
Field of study

Solving large-scale nonlinear minimization problems is computationally demanding. Nonlinear multilevel minimization (NMM) methods explore the structure of the underlying minimization problem to solve such problems in a computationally efficient and scalable manner. The efficiency of the NMM methods relies on the quality of the coarse-level models. Traditionally, coarse-level models are constructed using the additive approach, where the so-called

\tau

-correction enforces a local coherence between the fine-level and coarse-level objective functions. In this work, we extend this methodology and discuss how to enforce local coherence between the objective functions using a multiplicative approach. Moreover, we also present a hybrid approach, which takes advantage of both, additive and multiplicative, approaches. Using numerical experiments from the field of deep learning, we show that employing a hybrid approach can greatly improve the convergence speed of NMM methods and therefore it provides an attractive alternative to the almost universally used additive approach

arXiv.org e-Print Archive

A Multigrid Preconditioner for Jacobian-free Newton-Krylov Methods

Author: Kopaničáková Alena
Kothari Hardik
Krause Rolf
Publication venue
Publication date: 31/10/2022
Field of study

In this work, we propose a multigrid preconditioner for Jacobian-free Newton-Krylov (JFNK) methods. Our multigrid method does not require knowledge of the Jacobian at any level of the multigrid hierarchy. As it is common in standard multigrid methods, the proposed method also relies on three building blocks: transfer operators, smoothers, and a coarse level solver. In addition to the restriction and prolongation operator, we also use a projection operator to transfer the current Newton iterate to a coarser level. The three-level Chebyshev semi-iterative method is employed as a smoother, as it has good smoothing properties and does not require the representation of the Jacobian matrix. We replace the direct solver on the coarsest level with a matrix-free Krylov subspace method, thus giving rise to a truly Jacobian-free multigrid preconditioner. We will discuss all building blocks of our multigrid preconditioner in detail and demonstrate the robustness and the efficiency of the proposed method using several numerical examples

arXiv.org e-Print Archive

Nonlinear Schwarz preconditioning for nonlinear optimization problems with bound constraints

Author: Kopaničáková Alena
Kothari Hardik
Krause Rolf
Publication venue
Publication date: 27/11/2022
Field of study

We propose a nonlinear additive Schwarz method for solving nonlinear optimization problems with bound constraints. Our method is used as a "right-preconditioner" for solving the first-order optimality system arising within the sequential quadratic programming (SQP) framework using Newton's method. The algorithmic scalability of this preconditioner is enhanced by incorporating a solution-dependent coarse space, which takes into account the restricted constraints from the fine level. By means of numerical examples, we demonstrate that the proposed preconditioned Newton methods outperform standard active-set methods considered in the literature

arXiv.org e-Print Archive

Multilevel Minimization for Deep Residual Networks

Author: Gaedke-Merzhäuser Lisa
Kopaničáková Alena
Krause Rolf
Publication venue
Publication date: 13/04/2020
Field of study

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Multilevel minimization for deep residual networks

Author: Gaedke-Merzhäuser Lisa
Kopaničáková Alena
Krause Rolf
Publication venue: 'EDP Sciences'
Publication date: 01/08/2021
Field of study

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

Directory of Open Access Journals

Multilevel minimization in trust-region framework: algorithmic and software developments

Author: Kopaničáková Alena
Krause Rolf
Publication venue
Publication date: 09/02/2021
Field of study

The field of scientific computing is associated with the modeling of complex physical phenomena. The resulting numerical models are often described by differential equations, which, in many cases, can be related to non-convex minimization problems. Thus, after discretization, the solution of large-scale non-convex optimization problem is required. Various iterative solution strategies can be used to solve such optimization problems. However, the convergence speed of the majority of them deteriorates rapidly with increasing problem size. Multilevel methods are known to overcome this difficulty, and therefore we focus on a class of globally convergent multilevel solution strategies called the recursive multilevel trust-region (RMTR) method. The RMTR method combines globalization properties of trust-region method and the efficiency of multilevel methods. Despite its robustness and efficiency, the practical implementation of the RMTR method is a technically demanding task, which relies upon a suitable multilevel framework. This framework requires the careful design of two main components: i) multilevel hierarchy and transfer operators, ii) coarse-level models. To maximize the efficiency of the RMTR method, these components must be created with knowledge of the particular optimization problem in mind. In this thesis, we propose three novel variants of the RMTR method. Our first variant of the RMTR method is tailored for solving phase-field fracture problems. It employs novel coarse-level models, that allow the representation of fine-level fractures on the coarser levels. Our second RMTR variant is developed for thin-shell cloth simulations. Here, we employ a subdivision-based multilevel hierarchy and transfer operators. Our third variant of the RMTR method is designed for the training of the deep residual networks (ResNets). We construct the multilevel hierarchy and transfer operators by leveraging a dynamical system's view-point, which casts ResNet as the discretization of an initial value problem. We analyze the convergence properties of all three novel variants of the RMTR method. To this aim, we consider numerical examples from respective scientific fields. A comparison with a single-level trust-region method is made and demonstrates the efficiency of the proposed RMTR variants. Furthermore, we introduce our open-source library UTOPIA, which incorporates the parallel implementation of the multilevel methods presented in this work. Weak and strong scaling properties of our implementation are investigated up to 12,000 processors and a billion degrees of freedom

RERO DOC Digital Library

Multilevel minimization for deep residual networks

Author: Alena Kopaničáková
Lisa Gaedke-Merzhäuser
Rolf Krause
Publication venue: 'EDP Sciences'
Publication date: 01/09/2021
Field of study

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

EDP Sciences OAI-PMH repository (1.2.0)

Multilevel minimization for deep residual networks

Author: Alena Kopaničáková
Lisa Gaedke-Merzhäuser
Rolf Krause
Publication venue: EDP Sciences
Publication date: 01/09/2021
Field of study

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

EDP Sciences OAI-PMH repository (1.2.0)