Implementation-Oblivious Transparent Checkpoint-Restart for MPI

Abstract

This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.Comment: 17 pages, 4 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions