56 research outputs found

    Advanced I/O Techniques for Efficient and Highly Available Process Crash Recovery Protocols

    Get PDF
    As the number of CPU cores in high-performance computing platforms continues to grow, the availability and reliability of these systems become a primary concern. As such, some solutions are physical (ie. power backup) and some are software driven. Lawrence Berkeley National Laboratory has created a system-level fault-tolerant checkpoint/restart implementation for Linux Clusters. This allows processes to restart computations at the last known checkpoint in the event the system crashes. The checkpoint data creation is highly dependent on system input and output operations. This paper proposes: (i) a technique to improve the efficiency of these I/O operations and (ii) an alternative checkpoint creation method to increase availability and reliability of checkpointing data

    In-memory application-level checkpoint-based migration for MPI programs

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-014-1120-2[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based migration solution for MPI codes that uses the Hierarchical Data Format 5 (HDF5) to write the checkpoint files. The main features of the proposed solution are transparency for the user, achieved through the use of CPPC (ComPiler for Portable Checkpointing); portability, as the application-level approach makes the solution adequate for any MPI implementation and operating system, and the use of the HDF5 file format enables the restart on different architectures; and high performance, by saving the checkpoint files to memory instead of to disk through the use of the HDF5 in-memory files. Experimental results prove that the in-memory approach reduces significantly the I/O cost of the migration process.Ministerio de Ciencia e Innovación; TIN2010-16735Galicia. Consellería de Economía e Industria; 10PXIB105180P

    Process migration in a parallel environment

    Get PDF
    To satisfy the ever increasing demand for computational resources, high performance computing systems are becoming larger and larger. Unfortunately, the tools supporting system management tasks are only slowly adapting to the increase in components in computational clusters. Virtualization provides concepts which make system management tasks easier to implement by providing more flexibility for system administrators. With the help of virtual machine migration, the point in time for certain system management tasks like hardware or software upgrades no longer depends on the usage of the physical hardware. The flexibility to migrate a running virtual machine without significant interruption to the provided service makes it possible to perform system management tasks at the optimal point in time. In most high performance computing systems, however, virtualization is still not implemented. The reason for avoiding virtualization in high performance computing is that there is still an overhead accessing the CPU and I/O devices. This overhead continually decreases and there are different kind of virtualization techniques like para-virtualization and container-based virtualization which minimize this overhead further. With the CPU being one of the primary resources in high performance computing, this work proposes to migrate processes instead of virtual machines thus avoiding any overhead. Process migration can either be seen as an extension to pre-emptive multitasking over system boundaries or as a special form of checkpointing and restarting. In the scope of this work process migration is based on checkpointing and restarting as it is already an established technique in the field of fault tolerance. From the existing checkpointing and restarting implementations, the best suited implementation for process migration purposes was selected. One of the important requirements of the checkpointing and restarting implementation is transparency. Providing transparent process migration is important enable the migration of any process without prerequisites like re-compilation or running in a specially prepared environment. With process migration based on checkpointing and restarting, the next step towards providing process migration in a high performance computing environment is to support the migration of parallel processes. Using MPI is a common method of parallelizing applications and therefore process migration has to be integrated with an MPI implementation. The previously selected checkpointing and restarting implementation was integrated in an MPI implementation, and thus enabling the migration of parallel processes. With the help of different test cases the implemented process migration was analyzed, especially in regards to the time required to migrated a process and the advantages of optimizations to reduce the process’ downtime during migration.Um die immer steigenden Anforderungen an Rechenressourcen im High Performance Computing zu erfüllen werden die eingesetzten Systeme immer größer. Die Werkzeuge, mit denen Wartungsarbeiten durchgeführt werden, passen sich nur langsam an die wachsende Größe dieser neuen Systeme an. Virtualisierung stellt Konzepte zur Verfügung, welche Systemverwaltungsaufgaben durch höhere Flexibilität vereinfachen. Mit Hilfe der Migration virtueller Maschinen können Systemverwaltungsaufgaben zu einem frei wählbaren Zeitpunkt durchgeführt werden und hängen nicht mehr von der Nutzung der physikalischen Systeme ab. Die auf der virtuellen Maschine ausgeführte Applikation kann somit ohne Unterbrechung weiterlaufen. Trotz der vielen Vorteile wird Virtualisierung in den meisten High Performance Computing Systemen noch nicht eingesetzt, dadurch Rechenzeit verloren geht und höhere Antwortzeiten beim Zugriff auf Hardware auftreten. Obwohl die Effektivität der Virtualisierungsumgebungen steigt, werden Ansätze wie Para-Virtualisierung oder Container-basierte Virtualisierung untersucht bei denen noch weniger Rechenzeit verloren geht. Da die CPU eine der zentralen Ressourcen im High Performance Computing ist wird im Rahmen dieser Arbeit der Ansatz verfolgt anstatt virtueller Maschinen nur einzelne Prozesse zu migrieren und dadurch den Verlust an Rechenzeit zu vermeiden. Prozess Migration kann einerseits als eine Erweiterung des präemptive Multitasking über Systemgrenzen, andererseits auch als eine Sonderform des Checkpointing und Restarting angesehen werden. Im Rahmen dieser Arbeit wird Prozess Migration auf der Basis von Checkpointing und Restarting durchgeführt, da es eine bereits etablierte Technologie im Umfeld der Fehlertoleranz ist. Die am besten für Prozess Migration im Rahmen dieser Arbeit geeignete Checkpointing und Restarting Implementierung wurde ausgewählt. Eines der wichtigsten Kriterien bei der Auswahl der Checkpointing und Restarting Implementierung ist die Transparenz. Nur mit einer möglichst transparenten Implementierung sind die Anforderungen an die zu migrierenden Prozesse gering und keinerlei Einschränkungen wie das Neu-Übersetzen oder eine speziell präparierte Laufzeitumgebung sind nötig. Mit einer auf Checkpointing und Restarting basierenden Prozess Migration ist der nächste Schritt parallele Prozess Migration für den Einsatz im High Performance Computing. MPI ist einer der gängigen Wege eine Applikation zu parallelisieren und deshalb muss Prozess Migration auch in eine MPI Implementation integriert werden. Die vorhergehend ausgewählte Checkpointing und Restarting Implementierung wird in einer MPI Implementierung integriert, um auf diese Weise Migration von parallelen Prozessen zu bieten. Mit Hilfe verschiedener Testfälle wurde die im Rahmen dieser Arbeit entwickelte Prozess Migration analysiert. Schwerpunkte waren dabei die Zeit, die benötigt wird um einen Prozess zu migrieren und wie sich Optimierungen zur Verkürzung der Migrationszeit auswirken

    The Architecture of the XtreemOS Grid Checkpointing Service

    Get PDF
    The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Transparent live migration of container deployments in userspace

    Get PDF
    En aquesta tèsis de Màster, presentem una eina per realitzar migracions de contenidors tipus runC emprant CRIU. La nostre solució és eficient en termes d utilització de recursos, memòria i disc, i minimitza el temps de migració quan comparada amb una migració basada en capturar-transferir-reiniciar i amb la migració nativa de màquines virtuals oferida pels seus proveı̈dors. En afegit, la nostra eina permet migrar aplicacions que fan ús intensiu tant de memòria com de xarxa, amb connexions TCP establertes, i namespaces externs. La implementació està acompanyada d una recerca bibliogràfica en profunditat, aixı́ com d una sèrie d experiments que motiven els nostres criteris de disseny. El codi és de lliure accés i es pot trobar a la pàgina web del projecte
    corecore