32 research outputs found

    A reference model for integrated energy and power management of HPC systems

    Get PDF
    Optimizing a computer for highest performance dictates the efficient use of its limited resources. Computers as a whole are rather complex. Therefore, it is not sufficient to consider optimizing hardware and software components independently. Instead, a holistic view to manage the interactions of all components is essential to achieve system-wide efficiency. For High Performance Computing (HPC) systems, today, the major limiting resources are energy and power. The hardware mechanisms to measure and control energy and power are exposed to software. The software systems using these mechanisms range from firmware, operating system, system software to tools and applications. Efforts to improve energy and power efficiency of HPC systems and the infrastructure of HPC centers achieve perpetual advances. In isolation, these efforts are unable to cope with the rising energy and power demands of large scale systems. A systematic way to integrate multiple optimization strategies, which build on complementary, interacting hardware and software systems is missing. This work provides a reference model for integrated energy and power management of HPC systems: the Open Integrated Energy and Power (OIEP) reference model. The goal is to enable the implementation, setup, and maintenance of modular system-wide energy and power management solutions. The proposed model goes beyond current practices, which focus on individual HPC centers or implementations, in that it allows to universally describe any hierarchical energy and power management systems with a multitude of requirements. The model builds solid foundations to be understandable and verifiable, to guarantee stable interaction of hardware and software components, for a known and trusted chain of command. This work identifies the main building blocks of the OIEP reference model, describes their abstract setup, and shows concrete instances thereof. A principal aspect is how the individual components are connected, interface in a hierarchical manner and thus can optimize for the global policy, pursued as a computing center's operating strategy. In addition to the reference model itself, a method for applying the reference model is presented. This method is used to show the practicality of the reference model and its application. For future research in energy and power management of HPC systems, the OIEP reference model forms a cornerstone to realize --- plan, develop and integrate --- innovative energy and power management solutions. For HPC systems themselves, it supports to transparently manage current systems with their inherent complexity, it allows to integrate novel solutions into existing setups, and it enables to design new systems from scratch. In fact, the OIEP reference model represents a basis for holistic efficient optimization.Computer auf höchstmögliche Rechenleistung zu optimieren bedingt Effizienzmaximierung aller limitierenden Ressourcen. Computer sind komplexe Systeme. Deshalb ist es nicht ausreichend, Hardware und Software isoliert zu betrachten. Stattdessen ist eine Gesamtsicht des Systems notwendig, um die Interaktionen aller Einzelkomponenten zu organisieren und systemweite Optimierungen zu ermöglichen. Für Höchstleistungsrechner (HLR) ist die limitierende Ressource heute ihre Leistungsaufnahme und der resultierende Gesamtenergieverbrauch. In aktuellen HLR-Systemen sind Energie- und Leistungsaufnahme programmatisch auslesbar als auch direkt und indirekt steuerbar. Diese Mechanismen werden in diversen Softwarekomponenten von Firmware, Betriebssystem, Systemsoftware bis hin zu Werkzeugen und Anwendungen genutzt und stetig weiterentwickelt. Durch die Komplexität der interagierenden Systeme ist eine systematische Optimierung des Gesamtsystems nur schwer durchführbar, als auch nachvollziehbar. Ein methodisches Vorgehen zur Integration verschiedener Optimierungsansätze, die auf komplementäre, interagierende Hardware- und Softwaresysteme aufbauen, fehlt. Diese Arbeit beschreibt ein Referenzmodell für integriertes Energie- und Leistungsmanagement von HLR-Systemen, das „Open Integrated Energy and Power (OIEP)“ Referenzmodell. Das Ziel ist ein Referenzmodell, dass die Entwicklung von modularen, systemweiten energie- und leistungsoptimierenden Sofware-Verbunden ermöglicht und diese als allgemeines hierarchisches Managementsystem beschreibt. Dies hebt das Modell von bisherigen Ansätzen ab, welche sich auf Einzellösungen, spezifischen Software oder die Bedürfnisse einzelner Rechenzentren beschränken. Dazu beschreibt es Grundlagen für ein planbares und verifizierbares Gesamtsystem und erlaubt nachvollziehbares und sicheres Delegieren von Energie- und Leistungsmanagement an Untersysteme unter Aufrechterhaltung der Befehlskette. Die Arbeit liefert die Grundlagen des Referenzmodells. Hierbei werden die Einzelkomponenten der Software-Verbunde identifiziert, deren abstrakter Aufbau sowie konkrete Instanziierungen gezeigt. Spezielles Augenmerk liegt auf dem hierarchischen Aufbau und der resultierenden Interaktionen der Komponenten. Die allgemeine Beschreibung des Referenzmodells erlaubt den Entwurf von Systemarchitekturen, welche letztendlich die Effizienzmaximierung der Ressource Energie mit den gegebenen Mechanismen ganzheitlich umsetzen können. Hierfür wird ein Verfahren zur methodischen Anwendung des Referenzmodells beschrieben, welches die Modellierung beliebiger Energie- und Leistungsverwaltungssystemen ermöglicht. Für Forschung im Bereich des Energie- und Leistungsmanagement für HLR bildet das OIEP Referenzmodell Eckstein, um Planung, Entwicklung und Integration von innovativen Lösungen umzusetzen. Für die HLR-Systeme selbst unterstützt es nachvollziehbare Verwaltung der komplexen Systeme und bietet die Möglichkeit, neue Beschaffungen und Entwicklungen erfolgreich zu integrieren. Das OIEP Referenzmodell bietet somit ein Fundament für gesamtheitliche effiziente Systemoptimierung

    Algorithmische und Code-Optimierungen Molekulardynamiksimulationen fĂĽr Verfahrenstechnik

    Get PDF
    The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the world’s first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene für die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 Molekülen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingeführt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine Ausführung auf bis zu 32768 MPI-Prozessen

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Flexible Modellerweiterung und Optimierung von Erdbebensimulationen

    Get PDF
    Simulations of realistic earthquake scenarios require scalable software and extensive supercomputing resources. With increasing fidelity in simulations, advanced rheological and source models need to be incorporated. I introduce a domain-specific language in order to handle the model flexibility in combination with the high efficiency requirements. The contributions in this thesis enabled the to date largest and longest dynamic rupture simulation of the 2004 Sumatra earthquake.Realistische Erdbebensimulationen benötigen skalierbare Software und beträchtliche Rechenressourcen. Mit zunehmender Genauigkeit der Simulationen müssen fortschrittliche rheologische und Quellmodelle integriert werden. Ich führe eine domänenspezifische Sprache ein, um die Modelflexibilität in Kombination mit den hohen Effizienzanforderungen zu beherrschen. Die Beiträge in dieser Arbeit haben die bisher größte und längste dynamische Bruchsimulation des Sumatra-Erdbebens von 2004 ermöglicht

    The Peano software---parallel, automaton-based, dynamically adaptive grid traversals

    Get PDF
    We discuss the design decisions, design alternatives, and rationale behind the third generation of Peano, a framework for dynamically adaptive Cartesian meshes derived from spacetrees. Peano ties the mesh traversal to the mesh storage and supports only one element-wise traversal order resulting from space-filling curves. The user is not free to choose a traversal order herself. The traversal can exploit regular grid subregions and shared memory as well as distributed memory systems with almost no modifications to a serial application code. We formalize the software design by means of two interacting automata—one automaton for the multiscale grid traversal and one for the application-specific algorithmic steps. This yields a callback-based programming paradigm. We further sketch the supported application types and the two data storage schemes realized before we detail high-performance computing aspects and lessons learned. Special emphasis is put on observations regarding the used programming idioms and algorithmic concepts. This transforms our report from a “one way to implement things” code description into a generic discussion and summary of some alternatives, rationale, and design decisions to be made for any tree-based adaptive mesh refinement software

    Matrix-free finite-element computations at extreme scale and for challenging applications

    Get PDF
    For numerical computations based on finite element methods (FEM), it is common practice to assemble the system matrix related to the discretized system and to pass this matrix to an iterative solver. However, the assembly step can be costly and the matrix might become locally dense, e.g., in the context of high-order, high-dimensional, or strongly coupled multicomponent FEM, leading to high costs when applying the matrix due to limited bandwidth on modern CPU- and GPU-based hardware. Matrix-free algorithms are a means of accelerating FEM computations on HPC systems, by applying the effect of the system matrix without assembling it. Despite convincing arguments for matrix-free computations as a means of improving performance, their usage still tends to be an exception at the time of writing of this thesis, not least because they have not yet proven their applicability in all areas of computational science, e.g., solid mechanics. In this thesis, we further develop a state-of-the-art matrix-free framework for high-order FEM computations with focus on the preconditioning and adopt it in novel application fields. In the context of high-order FEM, we develop means of improving cache efficiency by interleaving cell loops with vector updates, which we use to increase the throughput of preconditioned conjugate gradient methods and of block smoothers based on additive Schwarz methods; we also propose an algorithm for the fast application of hanging-node constraints in 3D for up to 137 refinement configurations. We develop efficient geometric and polynomial multigrid solvers with optimized transfer operators, whose performance is experimentally investigated in detail in the context of locally refined meshes, indicating the superiority of global-coarsening algorithms. We apply the developed solvers in the context of novel stage-parallel implicit Runge–Kutta methods and demonstrate the benefit of stage–parallel solvers in decreasing the time to solution at the scaling limit. Novel challenging application fields of matrix-free computations include high-dimensional computational plasma physics, solid-state-sintering simulations with a high and dynamically changing number of strongly coupled components, and coupled multiphysics problems with evaluation and integration at arbitrary points. In the context of these fields, we detail computational challenges, propose modified versions of the standard matrix-free algorithms for high-performance computing, and discuss preconditioning-related topics. The efficiency of the derived algorithms on the node level and at extreme scales is demonstrated experimentally on SuperMUC-NG, one of Germany’s leading supercomputers, with up to 150k processes and by solving systems of up to 5 × 1012 unknowns. Such problem sizes would not be conceivable for equivalent matrix-based algorithms. The major achievements of this thesis allow to run larger simulations faster and more efficiently, enabling progress and new possibilities for a range of application fields in computational science
    corecore