858 research outputs found
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz,
Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft
Recommended from our members
Optimising subdomain aspect ratios for parallel load balancing
In parallel adaptive Finite Element simulations the work load 011the individual processors can change frequently. To (re)distribute the load evenly over the processors a load balancing heuristic is needed. Common strategies try to minimise subdomain dependencies by minimising the number of cut edges in the partition. For many solvers this is the most influential factor. However for example, for certain preconditioned Conjugate Gradient solvers this cutsize can play only a minor role, but their convergence can be highly dependent on the subdomain shapes. Degenerated subdomain shapes can cause them to need significantly more iterations to converge. Common heuristics often fail to address these requirements. In this thesis a new strategy is introduced which directly addresses the problem of generating and conserving reasonably good subdomain shapes while balancing the load in a dynamically changing Finite Element Simulation. A new definition of Aspect Ratio is presented which assesses subdomain shapes. The common methodology of using adjacency information to select the best elements to be migrated is not considered since it is not necessarily related to the subdomain shapes. Instead, geometric data is used to formulate several cost functions to rate elements in terms of their suitability to be migrated. The well known diffusive and Generalised Dimension Exchange methods which calculate the necessary load flow are enhanced by weighting the subdomain edges in order to influence their impact on the resulting partition positively. The results of comprehensive tests are presented and demonstrate that the proposed methods are competitive with state-of-the-art load balancing tools
Parallelization of the multi-level hp-adaptive finite cell method
The multi-level hp-refinement scheme is a powerful extension of the finite
element method that allows local mesh adaptation without the trouble of
constraining hanging nodes. This is achieved through hierarchical high-order
overlay meshes, a hp-scheme based on spatial refinement by superposition. An
efficient parallelization of this method using standard domain decomposition
approaches in combination with ghost elements faces the challenge of a large
basis function support resulting from the overlay structure and is in many
cases not feasible. In this contribution, a parallelization strategy for the
multi-level hp-scheme is presented that is adapted to the scheme's simple
hierarchical structure. By distributing the computational domain among
processes on the granularity of the active leaf elements and utilizing shared
mesh data structures, good parallel performance is achieved, as redundant
computations on ghost elements are avoided. We show the scheme's parallel
scalability for problems with a few hundred elements per process. Furthermore,
the scheme is used in conjunction with the finite cell method to perform
numerical simulations on domains of complex shape.Comment: 24 pages, 16 figure
Architecture independent environment for developing engineering software on MIMD computers
Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management
Recommended from our members
Schnelle Löser für Partielle Differentialgleichungen
This workshop was well attended by 52 participants with broad geographic representation from 11 countries and 3 continents. It was a nice blend of researchers with various backgrounds
- …