12 research outputs found

    On the use of hybrid coarse-level models in multilevel minimization methods

    Full text link
    Solving large-scale nonlinear minimization problems is computationally demanding. Nonlinear multilevel minimization (NMM) methods explore the structure of the underlying minimization problem to solve such problems in a computationally efficient and scalable manner. The efficiency of the NMM methods relies on the quality of the coarse-level models. Traditionally, coarse-level models are constructed using the additive approach, where the so-called τ\tau-correction enforces a local coherence between the fine-level and coarse-level objective functions. In this work, we extend this methodology and discuss how to enforce local coherence between the objective functions using a multiplicative approach. Moreover, we also present a hybrid approach, which takes advantage of both, additive and multiplicative, approaches. Using numerical experiments from the field of deep learning, we show that employing a hybrid approach can greatly improve the convergence speed of NMM methods and therefore it provides an attractive alternative to the almost universally used additive approach

    A Multigrid Preconditioner for Jacobian-free Newton-Krylov Methods

    Full text link
    In this work, we propose a multigrid preconditioner for Jacobian-free Newton-Krylov (JFNK) methods. Our multigrid method does not require knowledge of the Jacobian at any level of the multigrid hierarchy. As it is common in standard multigrid methods, the proposed method also relies on three building blocks: transfer operators, smoothers, and a coarse level solver. In addition to the restriction and prolongation operator, we also use a projection operator to transfer the current Newton iterate to a coarser level. The three-level Chebyshev semi-iterative method is employed as a smoother, as it has good smoothing properties and does not require the representation of the Jacobian matrix. We replace the direct solver on the coarsest level with a matrix-free Krylov subspace method, thus giving rise to a truly Jacobian-free multigrid preconditioner. We will discuss all building blocks of our multigrid preconditioner in detail and demonstrate the robustness and the efficiency of the proposed method using several numerical examples

    Nonlinear Schwarz preconditioning for nonlinear optimization problems with bound constraints

    Full text link
    We propose a nonlinear additive Schwarz method for solving nonlinear optimization problems with bound constraints. Our method is used as a "right-preconditioner" for solving the first-order optimality system arising within the sequential quadratic programming (SQP) framework using Newton's method. The algorithmic scalability of this preconditioner is enhanced by incorporating a solution-dependent coarse space, which takes into account the restricted constraints from the fine level. By means of numerical examples, we demonstrate that the proposed preconditioned Newton methods outperform standard active-set methods considered in the literature

    Multilevel Minimization for Deep Residual Networks

    Full text link
    We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

    Multilevel minimization for deep residual networks

    Get PDF
    We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

    Multilevel minimization in trust-region framework: algorithmic and software developments

    No full text
    The field of scientific computing is associated with the modeling of complex physical phenomena. The resulting numerical models are often described by differential equations, which, in many cases, can be related to non-convex minimization problems. Thus, after discretization, the solution of large-scale non-convex optimization problem is required. Various iterative solution strategies can be used to solve such optimization problems. However, the convergence speed of the majority of them deteriorates rapidly with increasing problem size. Multilevel methods are known to overcome this difficulty, and therefore we focus on a class of globally convergent multilevel solution strategies called the recursive multilevel trust-region (RMTR) method. The RMTR method combines globalization properties of trust-region method and the efficiency of multilevel methods. Despite its robustness and efficiency, the practical implementation of the RMTR method is a technically demanding task, which relies upon a suitable multilevel framework. This framework requires the careful design of two main components: i) multilevel hierarchy and transfer operators, ii) coarse-level models. To maximize the efficiency of the RMTR method, these components must be created with knowledge of the particular optimization problem in mind. In this thesis, we propose three novel variants of the RMTR method. Our first variant of the RMTR method is tailored for solving phase-field fracture problems. It employs novel coarse-level models, that allow the representation of fine-level fractures on the coarser levels. Our second RMTR variant is developed for thin-shell cloth simulations. Here, we employ a subdivision-based multilevel hierarchy and transfer operators. Our third variant of the RMTR method is designed for the training of the deep residual networks (ResNets). We construct the multilevel hierarchy and transfer operators by leveraging a dynamical system's view-point, which casts ResNet as the discretization of an initial value problem. We analyze the convergence properties of all three novel variants of the RMTR method. To this aim, we consider numerical examples from respective scientific fields. A comparison with a single-level trust-region method is made and demonstrates the efficiency of the proposed RMTR variants. Furthermore, we introduce our open-source library UTOPIA, which incorporates the parallel implementation of the multilevel methods presented in this work. Weak and strong scaling properties of our implementation are investigated up to 12,000 processors and a billion degrees of freedom

    Multilevel minimization for deep residual networks

    No full text
    We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy

    Multilevel minimization for deep residual networks

    No full text
    We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system’s viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy
    corecore