3,317 research outputs found

    Efficient Deterministic Replay Using Complete Race Detection

    Full text link
    Data races can significantly affect the executions of multi-threaded programs. Hence, one has to recur the results of data races to deterministically replay a multi-threaded program. However, data races are concealed in enormous number of memory operations in a program. Due to the difficulty of accurately identifying data races, previous multi-threaded deterministic record/replay schemes for commodity multi-processor system give up to record data races directly. Consequently, they either record all shared memory operations, which brings remarkable slowdown to the production run, or record the synchronization only, which introduces significant efforts to replay. Inspired by the advances in data race detection, we propose an efficient software-only deterministic replay scheme for commodity multi-processor systems, which is named RacX. The key insight of RacX is as follows: although it is NP-hard to accurately identify the existence of data races between a pair of memory operations, we can find out all potential data races in a multi-threaded program, in which the false positives can be reduced to a small amount with our automatic false positive reduction techniques. As a result, RacX can efficiently monitor all potential data races to deterministically replay a multi-threaded program. To evaluate RacX, we have carried out experiments over a number of well-known multi-threaded programs from SPLASH-2 benchmark suite and large-scale commercial programs. RacX can precisely recur production runs of these programs with value determinism. Averagely, RacX causes only about 1.21%, 1.89%, 2.20%, and 8.41% slowdown to the original run during recording (for 2-, 4-, 8- and 16-thread programs, respectively). The soundness, efficiency, scalability, and portability of RacX well demonstrate its superiority.Comment: 18 pages, 7 figure

    OPR

    Get PDF
    The ability to reproduce a parallel execution is desirable for debugging and program reliability purposes. In debugging (13), the programmer needs to manually step back in time, while for resilience (6) this is automatically performed by the the application upon failure. To be useful, replay has to faithfully reproduce the original execution. For parallel programs the main challenge is inferring and maintaining the order of conflicting operations (data races). Deterministic record and replay (R&R) techniques have been developed for multithreaded shared memory programs (5), as well as distributed memory programs (14). Our main interest is techniques for large scale scientific (3; 4) programming models

    Building Resilient Cloud Over Unreliable Commodity Infrastructure

    Full text link
    Cloud Computing has emerged as a successful computing paradigm for efficiently utilizing managed compute infrastructure such as high speed rack-mounted servers, connected with high speed networking, and reliable storage. Usually such infrastructure is dedicated, physically secured and has reliable power and networking infrastructure. However, much of our idle compute capacity is present in unmanaged infrastructure like idle desktops, lab machines, physically distant server machines, and laptops. We present a scheme to utilize this idle compute capacity on a best-effort basis and provide high availability even in face of failure of individual components or facilities. We run virtual machines on the commodity infrastructure and present a cloud interface to our end users. The primary challenge is to maintain availability in the presence of node failures, network failures, and power failures. We run multiple copies of a Virtual Machine (VM) redundantly on geographically dispersed physical machines to achieve availability. If one of the running copies of a VM fails, we seamlessly switchover to another running copy. We use Virtual Machine Record/Replay capability to implement this redundancy and switchover. In current progress, we have implemented VM Record/Replay for uniprocessor machines over Linux/KVM and are currently working on VM Record/Replay on shared-memory multiprocessor machines. We report initial experimental results based on our implementation.Comment: Oral presentation at IEEE "Cloud Computing for Emerging Markets", Oct. 11-12, 2012, Bangalore, Indi

    RepTFD: Replay Based Transient Fault Detection

    Full text link
    The advances in IC process make future chip multiprocessors (CMPs) more and more vulnerable to transient faults. To detect transient faults, previous core-level schemes provide redundancy for each core separately. As a result, they may leave transient faults in the uncore parts, which consume over 50% area of a modern CMP, escaped from detection. This paper proposes RepTFD, the first core-level transient fault detection scheme with 100% coverage. Instead of providing redundancy for each core separately, RepTFD provides redundancy for a group of cores as a whole. To be specific, it replays the execution of the checked group of cores on a redundant group of cores. Through comparing the execution results between the two groups of cores, all malignant transient faults can be caught. Moreover, RepTFD adopts a novel pending period based record-replay approach, which can greatly reduce the number of execution orders that need to be enforced in the replay-run. Hence, RepTFD brings only 4.76% performance overhead in comparison to the normal execution without fault-tolerance according to our experiments on the RTL design of an industrial CMP named Godson-3. In addition, RepTFD only consumes about 0.83% area of Godson-3, while needing only trivial modifications to existing components of Godson-3.Comment: 22 pages, 11 figure

    DoubleTake: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis

    Full text link
    This paper presents evidence-based dynamic analysis, an approach that enables lightweight analyses--under 5% overhead for these bugs--making it practical for the first time to perform these analyses in deployed settings. The key insight of evidence-based dynamic analysis is that for a class of errors, it is possible to ensure that evidence that they happened at some point in the past remains for later detection. Evidence-based dynamic analysis allows execution to proceed at nearly full speed until the end of an epoch (e.g., a heavyweight system call). It then examines program state to check for evidence that an error occurred at some time during that epoch. If so, it rolls back execution and re-executes the code with instrumentation activated to pinpoint the error. We present DoubleTake, a prototype evidence-based dynamic analysis framework. DoubleTake is practical and easy to deploy, requiring neither custom hardware, compiler, nor operating system support. We demonstrate DoubleTake's generality and efficiency by building dynamic analyses that find buffer overflows, memory use-after-free errors, and memory leaks. Our evaluation shows that DoubleTake is efficient, imposing just 4% overhead on average, making it the fastest such system to date. It is also precise: DoubleTake pinpoints the location of these errors to the exact line and memory addresses where they occur, providing valuable debugging information to programmers.Comment: Pre-print, accepted to appear at ICSE 201

    Efficient System-Enforced Deterministic Parallelism

    Full text link
    Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations. Determinator's kernel provides only single-threaded, "shared-nothing" address spaces interacting via deterministic synchronization. An untrusted user-level runtime uses distributed computing techniques to emulate familiar abstractions such as Unix processes, file systems, and shared memory multithreading. The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster. Coarse-grained parallel benchmarks perform and scale comparably to - sometimes better than - conventional systems, though determinism is costly for fine-grained parallel applications.Comment: 14 pages, 12 figures, 3 table

    Dynamic Analysis of Embedded Software

    Get PDF
    abstract: Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Morpheus: Safe and Flexible Dynamic Updates for SDNs

    Full text link
    SDN controllers must be periodically modified to add features, improve performance, and fix bugs, but current techniques for implementing dynamic updates are inadequate. Simply halting old controllers and bringing up new ones can cause state to be lost, which often leads to incorrect behavior-e.g., if the state represents hosts blacklisted by a firewall, then traffic that should be blocked may be allowed to pass through. Techniques based on record and replay can reconstruct state automatically, but they are expensive to deploy and can lead to incorrect behavior. Problematic scenarios are especially likely to arise in distributed controllers and with semantics-altering updates. This paper presents a new approach to implementing dynamic controller updates based on explicit state transfer. Instead of attempting to infer state changes automatically-an approach that is expensive and fundamentally incomplete-our framework gives programmers effective tools for implementing correct updates that avoid major disruptions. We develop primitives that enable programmers to directly (and easily, in most cases) initialize the new controller's state as a function of old state and we design protocols that ensure consistent behavior during the transition. We also present a prototype implementation called Morpheus, and evaluate its effectiveness on representative case studies

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications

    Full text link
    Reproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the in-situ setting, in which replay occurs within the same process as the recording process. Also, most existing work cannot achieve identical replay, which may prevent the reproduction of some errors. This paper presents iReplayer, which aims to identically replay multithreaded programs in the original process (under the "in-situ" setting). The novel in-situ and identical replay of iReplayer makes it more likely to reproduce errors, and allows it to directly employ debugging mechanisms (e.g. watchpoints) to aid failure diagnosis. Currently, iReplayer only incurs 3% performance overhead on average, which allows it to be always enabled in the production environment. iReplayer enables a range of possibilities, and this paper presents three examples: two automatic tools for detecting buffer overflows and use-after-free bugs, and one interactive debugging tool that is integrated with GDB.Comment: 16 pages, 5 figures, to be published at PLDI'1
    • …
    corecore