52 research outputs found

    Transparently Mixing Undo Logs and Software Reversibility for State Recovery in Optimistic PDES

    Get PDF
    The rollback operation is a fundamental building block to support the correct execution of a speculative Time Warp-based Parallel Discrete Event Simulation. In the literature, several solutions to reduce the execution cost of this operation have been proposed, either based on the creation of a checkpoint of previous simulation state images, or on the execution of negative copies of simulation events which are able to undo the updates on the state. In this paper, we explore the practical design and implementation of a state recoverability technique which allows to restore a previous simulation state either relying on checkpointing or on the reverse execution of the state updates occurred while processing events in forward mode. Differently from other proposals, we address the issue of executing backward updates in a fully-transparent and event granularity-independent way, by relying on static software instrumentation (targeting the x86 architecture and Linux systems) to generate at runtime reverse update code blocks (not to be confused with reverse events, proper of the reverse computing approach). These are able to undo the effects of a forward execution while minimizing the cost of the undo operation. We also present experimental results related to our implementation, which is released as free software and fully integrated into the open source ROOT-Sim (ROme OpTimistic Simulator) package. The experimental data support the viability and effectiveness of our proposal

    Optimizing memory management for optimistic simulation with reinforcement learning

    Get PDF
    Simulation is a powerful technique to explore complex scenarios and analyze systems related to a wide range of disciplines. To allow for an efficient exploitation of the available computing power, speculative Time Warp-based Parallel Discrete Event Simulation is universally recognized as a viable solution. In this context, the rollback operation is a fundamental building block to support a correct execution even when causality inconsistencies are a posteriori materialized. If this operation is supported via checkpoint/restore strategies, memory management plays a fundamental role to ensure high performance of the simulation run. With few exceptions, adaptive protocols targeting memory management for Time Warp-based simulations have been mostly based on a pre-defined analytic models of the system, expressed as a closed-form functions that map system's state to control parameters. The underlying assumption is that the model itself is optimal. In this paper, we present an approach that exploits reinforcement learning techniques. Rather than assuming an optimal control strategy, we seek to find the optimal strategy through parameter exploration. A value function that captures the history of system feedback is used, and no a-priori knowledge of the system is required. An experimental assessment of the viability of our proposal is also provided for a mobile cellular system simulation

    Fault-Tolerant Adaptive Parallel and Distributed Simulation

    Full text link
    Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    A new approach to reversible computing with applications to speculative parallel simulation

    Get PDF
    In this thesis, we propose an innovative approach to reversible computing that shifts the focus from the operations to the memory outcome of a generic program. This choice allows us to overcome some typical challenges of "plain" reversible computing. Our methodology is to instrument a generic application with the help of an instrumentation tool, namely Hijacker, which we have redesigned and developed for the purpose. Through compile-time instrumentation, we enhance the program's code to keep track of the memory trace it produces until the end. Regardless of the complexity behind the generation of each computational step of the program, we can build inverse machine instructions just by inspecting the instruction that is attempting to write some value to memory. Therefore from this information, we craft an ad-hoc instruction that conveys this old value and the knowledge of where to replace it. This instruction will become part of a more comprehensive structure, namely the reverse window. Through this structure, we have sufficient information to cancel all the updates done by the generic program during its execution. In this writing, we will discuss the structure of the reverse window, as the building block for the whole reversing framework we designed and finally realized. Albeit we settle our solution in the specific context of the parallel discrete event simulation (PDES) adopting the Time Warp synchronization protocol, this framework paves the way for further general-purpose development and employment. We also present two additional innovative contributions coming from our innovative reversibility approach, both of them still embrace traditional state saving-based rollback strategy. The first contribution aims to harness the advantages of both the possible approaches. We implement the rollback operation combining state saving together with our reversible support through a mathematical model. This model enables the system to choose in autonomicity the best rollback strategy, by the mutable runtime dynamics of programs. The second contribution explores an orthogonal direction, still related to reversible computing aspects. In particular, we will address the problem of reversing shared libraries. Indeed, leading from their nature, shared objects are visible to the whole system and so does every possible external modification of their code. As a consequence, it is not possible to instrument them without affecting other unaware applications. We propose a different method to deal with the instrumentation of shared objects. All our innovative proposals have been assessed using the last generation of the open source ROOT-Sim PDES platform, where we integrated our solutions. ROOT-Sim is a C-based package implementing a general purpose simulation environment based on the Time Warp synchronization protocol

    Autonomic State Management for Optimistic Simulation Platforms

    Get PDF
    We present the design and implementation of an autonomic state manager (ASM) tailored for integration within optimistic parallel discrete event simulation (PDES) environments based on the C programming language and the executable and linkable format (ELF), and developed for execution on x8664 architectures. With ASM, the state of any logical process (LP), namely the individual (concurrent) simulation unit being part of the simulation model, is allowed to be scattered on dynamically allocated memory chunks managed via standard API (e.g., malloc/free). Also, the application programmer is not required to provide any serialization/deserialization module in order to take a checkpoint of the LP state, or to restore it in case a causality error occurs during the optimistic run, or to provide indications on which portions of the state are updated by event processing, so to allow incremental checkpointing. All these tasks are handled by ASM in a fully transparent manner via (A) runtime identification (with chunk-level granularity) of the memory map associated with the LP state, and (B) runtime tracking of the memory updates occurring within chunks belonging to the dynamic memory map. The co-existence of the incremental and non-incremental log/restore modes is achieved via dual versions of the same application code, transparently generated by ASM via compile/link time facilities. Also, the dynamic selection of the best suited log/restore mode is actuated by ASM on the basis of an innovative modeling/optimization approach which takes into account stability of each operating mode with respect to variations of the model/environmental execution parameters

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    Cache-Aware Memory Manager for Optimistic Simulations

    Get PDF
    Parallel Discrete Event Simulation is a well known technique for executing complex general-purpose simulations where models are described as objects the interaction of which is expressed through the generation of impulsive events. In particular, Optimistic Simulation allows full exploitation of the available computational power, avoiding the need to compute safety properties for the events to be executed. Optimistic Simulation platforms internally rely on several data structures, which are meant to support operations aimed at ensuring correctness, inter-kernel communication and/or event scheduling. These housekeeping and management operations access them according to complex patterns, commonly su๏ฌ€ering from misuse of memory caching architectures. In particular, operations like log/restore access data structures on a periodic basis, producing the replacement of in-cache bu๏ฌ€ers related to the actual working set of the application logic, producing a non-negligible performance drop. In this work we propose generally-applicable design principles for a new memory management subsystem targeted at Optimistic Simulation platforms which can face this issue by wisely allocating memory bu๏ฌ€ers depending on their actual future access patterns, in order to enhance event-execution memory locality. Additionally, an application-transparent implementation within ROOT-Sim, an open-source generalpurpose optimistic simulation platform, is presented along with experimental results testing our proposal

    GPU ์—๋Ÿฌ ์•ˆ์ •์„ฑ ๋ณด์žฅ์„ ์œ„ํ•œ ์ปดํŒŒ์ผ๋Ÿฌ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์ด์žฌ์ง„.Due to semiconductor technology scaling and near-threshold voltage computing, soft error resilience has become more important. Nowadays, GPUs are widely used in high performance computing (HPC) because of its efficient parallel processing and modern GPUs designed for HPC use error correction code (ECC) to protect their storage including register files. However, adopting ECC in the register file imposes high area and energy overhead. To replace the expensive hardware cost of ECC, we propose Penny, a lightweight compiler-directed resilience scheme for GPU register file protection. We combine recent advances in idempotent recovery with low-cost error detection code. Our approach focuses on solving two important problems: 1. Can we guarantee correct error recovery using idempotent execution with error detection code? We show that when an error detection code is used with idempotence recovery, certain restrictions required by previous idempotent recovery schemes are no longer needed. We also propose a software-based scheme to prevent the checkpoint value from being overwritten before the end of the region where the value is required for correct recovery. 2. How do we reduce the execution overhead caused by checkpointing? In GPUs additional checkpointing store instructions inflicts considerably higher overhead compared to CPUs, due to its architectural characteristics, such as lack of store buffers. We propose a number of compiler optimizations techniques that significantly reduce the overhead.๋ฐ˜๋„์ฒด ๋ฏธ์„ธ๊ณต์ • ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•˜๊ณ  ๋ฌธํ„ฑ์ „์•• ๊ทผ์ฒ˜ ์ปดํ“จํŒ…(near-threashold voltage computing)์ด ๋„์ž…๋จ์— ๋”ฐ๋ผ์„œ ์†Œํ”„ํŠธ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ์˜ ๋ณต์›์ด ์ค‘์š”ํ•œ ๊ณผ์ œ๊ฐ€ ๋˜์—ˆ๋‹ค. ๊ฐ•๋ ฅํ•œ ๋ณ‘๋ ฌ ๊ณ„์‚ฐ ์„ฑ๋Šฅ์„ ์ง€๋‹Œ GPU๋Š” ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ…์—์„œ ์ค‘์š”ํ•œ ์œ„์น˜๋ฅผ ์ฐจ์ง€ํ•˜๊ฒŒ ๋˜์—ˆ๊ณ , ์Šˆํผ ์ปดํ“จํ„ฐ์—์„œ ์“ฐ์ด๋Š” GPU๋“ค์€ ์—๋Ÿฌ ๋ณต์› ์ฝ”๋“œ์ธ ECC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋“ฑ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ์— ECC๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํฐ ํ•˜๋“œ์›จ์–ด๋‚˜ ์—๋„ˆ์ง€ ๋น„์šฉ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ์ด๋Ÿฐ ๊ฐ’๋น„์‹ผ ECC์˜ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ปดํŒŒ์ผ๋Ÿฌ ๊ธฐ๋ฐ˜์˜ ์ €๋น„์šฉ GPU ๋ ˆ์ง€์Šคํ„ฐ ํŒŒ์ผ ๋ณต์› ๊ธฐ๋ฒ•์ธ Penny๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ์ตœ์‹ ์˜ ๋ฉฑ๋“ฑ์„ฑ(idempotency) ๊ธฐ๋ฐ˜ ์—๋Ÿฌ ๋ณต์› ๊ธฐ๋ฒ•์„ ์ €๋น„์šฉ์˜ ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ(EDC)์™€ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์Œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ง‘์ค‘ํ•œ๋‹ค. 1. ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฉฑ๋“ฑ์„ฑ ๊ธฐ๋ฐ˜ ์—๋Ÿฌ ๋ณต์›์„ ์‚ฌ์šฉ์‹œ ์†Œํ”„ํŠธ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ์˜ ์•ˆ์ „ํ•œ ๋ณต์›์„ ๋ณด์žฅํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?} ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—๋Ÿฌ ๊ฒ€์ถœ ์ฝ”๋“œ๊ฐ€ ๋ฉฑ๋“ฑ์„ฑ ๊ธฐ๋ฐ˜ ๋ณต์› ๊ธฐ์ˆ ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉ๋˜์—ˆ์„ ๊ฒฝ์šฐ ๊ธฐ์กด์˜ ๋ณต์› ๊ธฐ๋ฒ•์—์„œ ํ•„์š”๋กœ ํ–ˆ๋˜ ์กฐ๊ฑด๋“ค ์—†์ด๋„ ์•ˆ์ „ํ•˜๊ฒŒ ์—๋Ÿฌ๋กœ๋ถ€ํ„ฐ ๋ณต์›ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. 2. ์ฒดํฌํฌ์ธํŒ…์—๋“œ๋Š” ๋น„์šฉ์„ ์–ด๋–ป๊ฒŒ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?} GPU๋Š” ์Šคํ† ์–ด ๋ฒ„ํผ๊ฐ€ ์—†๋Š” ๋“ฑ ์•„ํ‚คํ…์ณ์ ์ธ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด์„œ CPU์™€ ๋น„๊ตํ•˜์—ฌ ์ฒดํฌํฌ์ธํŠธ ๊ฐ’์„ ์ €์žฅํ•˜๋Š” ๋ฐ์— ํฐ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋“ ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ปดํŒŒ์ผ๋Ÿฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•˜์—ฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ธ๋‹ค.1 Introduction 1 1.1 Why is Soft Error Resilience Important in GPUs 1 1.2 How can the ECC Overhead be Reduced 3 1.3 What are the Challenges 4 1.4 How do We Solve the Challenges 5 2 Comparison of Error Detection and Correction Coding Schemes for Register File Protection 7 2.1 Error Correction Codes and Error Detection Codes 8 2.2 Cost of Coding Schemes 9 2.3 Soft Error Frequency of GPUs 11 3 Idempotent Recovery and Challenges 13 3.1 Idempotent Execution 13 3.2 Previous Idempotent Schemes 13 3.2.1 De Kruijf's Idempotent Translation 14 3.2.2 Bolts's Idempotent Recovery 15 3.2.3 Comparison between Idempotent Schemes 15 3.3 Idempotent Recovery Process 17 3.4 Idempotent Recovery Challenges for GPUs 18 3.4.1 Checkpoint Overwriting 20 3.4.2 Performance Overhead 20 4 Correctness of Recovery 22 4.1 Proof of Safe Recovery 23 4.1.1 Prevention of Error Propagation 23 4.1.2 Proof of Correct State Recovery 24 4.1.3 Correctness in Multi-Threaded Execution 28 4.2 Preventing Checkpoint Overwriting 30 4.2.1 Register renaming 31 4.2.2 Storage Alternation by Checkpoint Coloring 33 4.2.3 Automatic Algorithm Selection 38 4.2.4 Future Works 38 5 Performance Optimizations 40 5.1 Compilation Phases of Penny 40 5.1.1 Region Formation 41 5.1.2 Bimodal Checkpoint Placement 41 5.1.3 Storage Alternation 42 5.1.4 Checkpoint Pruning 43 5.1.5 Storage Assignment 44 5.1.6 Code Generation and Low-level Optimizations 45 5.2 Cost Estimation Model 45 5.3 Region Formation 46 5.3.1 De Kruijf's Heuristic Region Formation 46 5.3.2 Region splitting and Region Stitching 47 5.3.3 Checkpoint-Cost Aware Optimal Region Formation 48 5.4 Bimodal Checkpoint Placement 52 5.5 Optimal Checkpoint Pruning 55 5.5.1 Bolt's Naive Pruning Algorithm and Overview of Penny's Optimal Pruning Algorithm 55 5.5.2 Phase 1: Collecting Global-Decision Independent Status 56 5.5.3 Phase2: Ordering and Finalizing Renaming Decisions 60 5.5.4 Effectiveness of Eliminating the Checkpoints 63 5.6 Automatic Checkpoint Storage Assignment 69 5.7 Low-Level Optimizations and Code Generation 70 6 Evaluation 74 6.1 Test Environment 74 6.1.1 GPU Architecture and Simulation Setup 74 6.1.2 Tested Applications 75 6.1.3 Register Assignment 76 6.2 Performance Evaluation 77 6.2.1 Overall Performance Overheads 77 6.2.2 Impact of Penny's Optimizations 78 6.2.3 Assigning Checkpoint Storage and Its Integrity 79 6.2.4 Impact of Optimal Checkpoint Pruning 80 6.2.5 Impact of Alias Analysis 81 6.3 Repurposing the Saved ECC Area 82 6.4 Energy Impact on Execution 83 6.5 Performance Overhead on Volta Architecture 85 6.6 Compilation Time 85 7 RelatedWorks 87 8 Conclusion and Future Works 89 8.1 Limitation and Future Work 90Docto
    • โ€ฆ
    corecore