593 research outputs found

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    07071 Abstracts Collection -- Web Information Retrieval and Linear Algebra Algorithms

    Get PDF
    From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    A Unifying Theory for Nonlinear Additively and Multiplicatively Preconditioned Globalization Strategies : Convergence Results and Examples From the Field of Nonlinear Elastostatics and Elastodynamics

    Get PDF
    Nonlinear right preconditioned globalization strategies for the solution of nonlinear programming problems of the following kind u∈B⊂Rn:J(u)=min⁡!u \in \mathcal B \subset \mathbb R^n: J(u) = \min! where B\mathcal B is a convex set of admissible solutions, n∈Nn\in \mathbb N, and J:Rn→RJ: \mathbb R^n \to \mathbb R, sufficiently smooth, are presented. Preconditioned globalization strategies are traditional Linesearch or Trust-Region strategies in combination with a nonlinear update operator which results from a nonlinear solution process for smaller, but related, nonlinear programming problems. We will formulate conditions on this abstract operator, in order to ensure global convergence, i.e., convergence to first-order critical points, of the resulting method. In addition, we introduce particular implementations of this abstract operator, i.e., nonlinear multiplicatively preconditioned Trust-Region (MPTS) and Linesearch strategies (MPLS), as well as nonlinear additively preconditioned Trust-Region (APTS) and Linesearch (APLS) strategies. As it turns out, these additive strategies are novel parallel, locally adaptive and robust solution methods for nonlinear programming problems. Moreover, the MPTS strategy generalizes the RMTR concepts in [GK08] in order to allow also for the application of alternating nonlinear domain decomposition methods. On the other hand, the MPLS method simplifies and generalizes the concepts in [WG08] giving rise to a novel solution strategy for pointwise constrained nonlinear programming problems. The respective nonlinear solution strategies are analyzed and global convergence is shown. In addition, global convergence is also shown for combined nonlinear additively and multiplicatively preconditioned Trust-Region and Linesearch strategies. Moreover, we show the efficiency and reliability of these methods in the context of problems arising from the field of nonlinear elasticity in 3d. Particular emphasis has been placed on the formulation and analysis of the resulting minimization problems. Here, we show that these problems satisfy the assumptions stated to show convergence of the respective preconditioned globalization strategies. Moreover, various elasto-static and elasto-dynamic examples are presented in order to compare the convergence rates and runtimes of the different strategies
    • 

    corecore