3,317 research outputs found
Efficient Deterministic Replay Using Complete Race Detection
Data races can significantly affect the executions of multi-threaded
programs. Hence, one has to recur the results of data races to
deterministically replay a multi-threaded program. However, data races are
concealed in enormous number of memory operations in a program. Due to the
difficulty of accurately identifying data races, previous multi-threaded
deterministic record/replay schemes for commodity multi-processor system give
up to record data races directly. Consequently, they either record all shared
memory operations, which brings remarkable slowdown to the production run, or
record the synchronization only, which introduces significant efforts to
replay.
Inspired by the advances in data race detection, we propose an efficient
software-only deterministic replay scheme for commodity multi-processor
systems, which is named RacX. The key insight of RacX is as follows: although
it is NP-hard to accurately identify the existence of data races between a pair
of memory operations, we can find out all potential data races in a
multi-threaded program, in which the false positives can be reduced to a small
amount with our automatic false positive reduction techniques. As a result,
RacX can efficiently monitor all potential data races to deterministically
replay a multi-threaded program.
To evaluate RacX, we have carried out experiments over a number of well-known
multi-threaded programs from SPLASH-2 benchmark suite and large-scale
commercial programs. RacX can precisely recur production runs of these programs
with value determinism. Averagely, RacX causes only about 1.21%, 1.89%, 2.20%,
and 8.41% slowdown to the original run during recording (for 2-, 4-, 8- and
16-thread programs, respectively). The soundness, efficiency, scalability, and
portability of RacX well demonstrate its superiority.Comment: 18 pages, 7 figure
OPR
The ability to reproduce a parallel execution is desirable for debugging and program reliability purposes. In debugging (13), the programmer needs to manually step back in time, while for resilience (6) this is automatically performed by the the application upon failure. To be useful, replay has to faithfully reproduce the original execution. For parallel programs the main challenge is inferring and maintaining the order of conflicting operations (data races). Deterministic record and replay (R&R) techniques have been developed for multithreaded shared memory programs (5), as well as distributed memory programs (14). Our main interest is techniques for large scale scientific (3; 4) programming models
Building Resilient Cloud Over Unreliable Commodity Infrastructure
Cloud Computing has emerged as a successful computing paradigm for
efficiently utilizing managed compute infrastructure such as high speed
rack-mounted servers, connected with high speed networking, and reliable
storage. Usually such infrastructure is dedicated, physically secured and has
reliable power and networking infrastructure. However, much of our idle compute
capacity is present in unmanaged infrastructure like idle desktops, lab
machines, physically distant server machines, and laptops. We present a scheme
to utilize this idle compute capacity on a best-effort basis and provide high
availability even in face of failure of individual components or facilities.
We run virtual machines on the commodity infrastructure and present a cloud
interface to our end users. The primary challenge is to maintain availability
in the presence of node failures, network failures, and power failures. We run
multiple copies of a Virtual Machine (VM) redundantly on geographically
dispersed physical machines to achieve availability. If one of the running
copies of a VM fails, we seamlessly switchover to another running copy. We use
Virtual Machine Record/Replay capability to implement this redundancy and
switchover. In current progress, we have implemented VM Record/Replay for
uniprocessor machines over Linux/KVM and are currently working on VM
Record/Replay on shared-memory multiprocessor machines. We report initial
experimental results based on our implementation.Comment: Oral presentation at IEEE "Cloud Computing for Emerging Markets",
Oct. 11-12, 2012, Bangalore, Indi
RepTFD: Replay Based Transient Fault Detection
The advances in IC process make future chip multiprocessors (CMPs) more and
more vulnerable to transient faults. To detect transient faults, previous
core-level schemes provide redundancy for each core separately. As a result,
they may leave transient faults in the uncore parts, which consume over 50%
area of a modern CMP, escaped from detection. This paper proposes RepTFD, the
first core-level transient fault detection scheme with 100% coverage. Instead
of providing redundancy for each core separately, RepTFD provides redundancy
for a group of cores as a whole. To be specific, it replays the execution of
the checked group of cores on a redundant group of cores. Through comparing the
execution results between the two groups of cores, all malignant transient
faults can be caught. Moreover, RepTFD adopts a novel pending period based
record-replay approach, which can greatly reduce the number of execution orders
that need to be enforced in the replay-run. Hence, RepTFD brings only 4.76%
performance overhead in comparison to the normal execution without
fault-tolerance according to our experiments on the RTL design of an industrial
CMP named Godson-3. In addition, RepTFD only consumes about 0.83% area of
Godson-3, while needing only trivial modifications to existing components of
Godson-3.Comment: 22 pages, 11 figure
DoubleTake: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis
This paper presents evidence-based dynamic analysis, an approach that enables
lightweight analyses--under 5% overhead for these bugs--making it practical for
the first time to perform these analyses in deployed settings. The key insight
of evidence-based dynamic analysis is that for a class of errors, it is
possible to ensure that evidence that they happened at some point in the past
remains for later detection. Evidence-based dynamic analysis allows execution
to proceed at nearly full speed until the end of an epoch (e.g., a heavyweight
system call). It then examines program state to check for evidence that an
error occurred at some time during that epoch. If so, it rolls back execution
and re-executes the code with instrumentation activated to pinpoint the error.
We present DoubleTake, a prototype evidence-based dynamic analysis framework.
DoubleTake is practical and easy to deploy, requiring neither custom hardware,
compiler, nor operating system support. We demonstrate DoubleTake's generality
and efficiency by building dynamic analyses that find buffer overflows, memory
use-after-free errors, and memory leaks. Our evaluation shows that DoubleTake
is efficient, imposing just 4% overhead on average, making it the fastest such
system to date. It is also precise: DoubleTake pinpoints the location of these
errors to the exact line and memory addresses where they occur, providing
valuable debugging information to programmers.Comment: Pre-print, accepted to appear at ICSE 201
Efficient System-Enforced Deterministic Parallelism
Deterministic execution offers many benefits for debugging, fault tolerance,
and security. Running parallel programs deterministically is usually difficult
and costly, however - especially if we desire system-enforced determinism,
ensuring precise repeatability of arbitrarily buggy or malicious software.
Determinator is a novel operating system that enforces determinism on both
multithreaded and multi-process computations. Determinator's kernel provides
only single-threaded, "shared-nothing" address spaces interacting via
deterministic synchronization. An untrusted user-level runtime uses distributed
computing techniques to emulate familiar abstractions such as Unix processes,
file systems, and shared memory multithreading. The system runs parallel
applications deterministically both on multicore PCs and across nodes in a
cluster. Coarse-grained parallel benchmarks perform and scale comparably to -
sometimes better than - conventional systems, though determinism is costly for
fine-grained parallel applications.Comment: 14 pages, 12 figures, 3 table
Dynamic Analysis of Embedded Software
abstract: Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Morpheus: Safe and Flexible Dynamic Updates for SDNs
SDN controllers must be periodically modified to add features, improve
performance, and fix bugs, but current techniques for implementing dynamic
updates are inadequate. Simply halting old controllers and bringing up new ones
can cause state to be lost, which often leads to incorrect behavior-e.g., if
the state represents hosts blacklisted by a firewall, then traffic that should
be blocked may be allowed to pass through. Techniques based on record and
replay can reconstruct state automatically, but they are expensive to deploy
and can lead to incorrect behavior. Problematic scenarios are especially likely
to arise in distributed controllers and with semantics-altering updates.
This paper presents a new approach to implementing dynamic controller updates
based on explicit state transfer. Instead of attempting to infer state changes
automatically-an approach that is expensive and fundamentally incomplete-our
framework gives programmers effective tools for implementing correct updates
that avoid major disruptions. We develop primitives that enable programmers to
directly (and easily, in most cases) initialize the new controller's state as a
function of old state and we design protocols that ensure consistent behavior
during the transition. We also present a prototype implementation called
Morpheus, and evaluate its effectiveness on representative case studies
Doctor of Philosophy
dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay
iReplayer: In-situ and Identical Record-and-Replay for Multithreaded Applications
Reproducing executions of multithreaded programs is very challenging due to
many intrinsic and external non-deterministic factors. Existing RnR systems
achieve significant progress in terms of performance overhead, but none targets
the in-situ setting, in which replay occurs within the same process as the
recording process. Also, most existing work cannot achieve identical replay,
which may prevent the reproduction of some errors.
This paper presents iReplayer, which aims to identically replay multithreaded
programs in the original process (under the "in-situ" setting). The novel
in-situ and identical replay of iReplayer makes it more likely to reproduce
errors, and allows it to directly employ debugging mechanisms (e.g.
watchpoints) to aid failure diagnosis. Currently, iReplayer only incurs 3%
performance overhead on average, which allows it to be always enabled in the
production environment. iReplayer enables a range of possibilities, and this
paper presents three examples: two automatic tools for detecting buffer
overflows and use-after-free bugs, and one interactive debugging tool that is
integrated with GDB.Comment: 16 pages, 5 figures, to be published at PLDI'1
- …