2 research outputs found

    EXECUTION-DRIVEN SIMULATION OF ERROR RECOVERY TECHNIQUES FOR MULTICOMPUTERS

    No full text
    DERT (Distributed Error Recovery Testbed) is a testbed for simulation and performance evaluation of several classes of application-transparent distributed error recovery schemes. DERT is built on top of an event-driven, message-passing, object-oriented, multithreaded simulation kernel. Actual compiled distributed applications are instrumented for data collection and executed on the simulated multicomputer. Checkpointing is implemented in full detail, including associated overhead per message, additional messages, and changes to the memory system. DERT allows easy modification of a wide variety of system parameters, thus offering a level of flexibility not easily achieved by experimentation on a particular real machine. This paper describes the design, functionality, and performance of DERT. The main problems encountered in DERT’s development are discussed, as well as examples of its use in evaluating recovery schemes. I

    Execution-Driven Simulation Of Error Recovery Techniques For Multicomputers

    No full text
    DERT (Distributed Error Recovery Testbed) is a testbed for simulation and performance evaluation of several classes of application-transparent distributed error recovery schemes. DERT is built on top of an event-driven, message-passing, object-oriented, multithreaded simulation kernel. Actual compiled distributed applications are instrumented for data collection and executed on the simulated multicomputer. Checkpointing is implemented in full detail, including associated overhead per message, additional messages, and changes to the memory system. DERT allows easy modification of a wide variety of system parameters, thus offering a level of flexibility not easily achieved by experimentation on a particular real machine. This paper describes the design, functionality, and performance of DERT. The main problems encountered in DERT's development are discussed, as well as examples of its use in evaluating recovery schemes. I. Introduction For many important applications of distributed sys..
    corecore