2,447 research outputs found
Execution replay and debugging
As most parallel and distributed programs are internally non-deterministic --
consecutive runs with the same input might result in a different program flow
-- vanilla cyclic debugging techniques as such are useless. In order to use
cyclic debugging tools, we need a tool that records information about an
execution so that it can be replayed for debugging. Because recording
information interferes with the execution, we must limit the amount of
information and keep the processing of the information fast. This paper
contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003
Deterministic Consistency: A Programming Model for Shared Memory Parallelism
The difficulty of developing reliable parallel software is generating
interest in deterministic environments, where a given program and input can
yield only one possible result. Languages or type systems can enforce
determinism in new code, and runtime systems can impose synthetic schedules on
legacy parallel code. To parallelize existing serial code, however, we would
like a programming model that is naturally deterministic without language
restrictions or artificial scheduling. We propose "deterministic consistency",
a parallel programming model as easy to understand as the "parallel assignment"
construct in sequential languages such as Perl and JavaScript, where concurrent
threads always read their inputs before writing shared outputs. DC supports
common data- and task-parallel synchronization abstractions such as fork/join
and barriers, as well as non-hierarchical structures such as producer/consumer
pipelines and futures. A preliminary prototype suggests that software-only
implementations of DC can run applications written for popular parallel
environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure
OPR
The ability to reproduce a parallel execution is desirable for debugging and program reliability purposes. In debugging (13), the programmer needs to manually step back in time, while for resilience (6) this is automatically performed by the the application upon failure. To be useful, replay has to faithfully reproduce the original execution. For parallel programs the main challenge is inferring and maintaining the order of conflicting operations (data races). Deterministic record and replay (R&R) techniques have been developed for multithreaded shared memory programs (5), as well as distributed memory programs (14). Our main interest is techniques for large scale scientific (3; 4) programming models
Building Resilient Cloud Over Unreliable Commodity Infrastructure
Cloud Computing has emerged as a successful computing paradigm for
efficiently utilizing managed compute infrastructure such as high speed
rack-mounted servers, connected with high speed networking, and reliable
storage. Usually such infrastructure is dedicated, physically secured and has
reliable power and networking infrastructure. However, much of our idle compute
capacity is present in unmanaged infrastructure like idle desktops, lab
machines, physically distant server machines, and laptops. We present a scheme
to utilize this idle compute capacity on a best-effort basis and provide high
availability even in face of failure of individual components or facilities.
We run virtual machines on the commodity infrastructure and present a cloud
interface to our end users. The primary challenge is to maintain availability
in the presence of node failures, network failures, and power failures. We run
multiple copies of a Virtual Machine (VM) redundantly on geographically
dispersed physical machines to achieve availability. If one of the running
copies of a VM fails, we seamlessly switchover to another running copy. We use
Virtual Machine Record/Replay capability to implement this redundancy and
switchover. In current progress, we have implemented VM Record/Replay for
uniprocessor machines over Linux/KVM and are currently working on VM
Record/Replay on shared-memory multiprocessor machines. We report initial
experimental results based on our implementation.Comment: Oral presentation at IEEE "Cloud Computing for Emerging Markets",
Oct. 11-12, 2012, Bangalore, Indi
Non-intrusive on-the-fly data race detection using execution replay
This paper presents a practical solution for detecting data races in parallel
programs. The solution consists of a combination of execution replay (RecPlay)
with automatic on-the-fly data race detection. This combination enables us to
perform the data race detection on an unaltered execution (almost no probe
effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks
limits the amount of memory used. As the record phase of RecPlay is highly
efficient, there is no need to switch it off, hereby eliminating the
possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003
- …