3,049 research outputs found
A New Concurrent Checkpoint Mechanism for Embeded Multi-Core Systems
his paper presents a new transparent, incremental, concurrent checkpoint mechanism for embedded multi-core systems. It allows the checkpointed process (also called checkpointee) to continue running without stopping while checkpoints are set to a large extent. Through tracing TLB misses to block the accesses to target memory pages first time while dumping memory pages (the most time-consuming step when setting a checkpoint). At that time, a kernel thread, called checkpointer, copies the memory access target pages to the designated memory buffer for constructing a consistent state of the checkpointee, and then resumes the memory accesses. From the experimental results, in contrast to a traditional concurrent checkpoint system, the proposed mechanism reduces the downtime of the checkpointed process by more than 10.1 %. Moreover, the incremental checkpointing functionality has been implemented in this new concurrent checkpoint mechanism as well. Compared with full checkpointing, incremental checkpointing can reduce the checkpoint time more than 95.5 % and 89.2 % while the benchmark is the matrix multiplication at the checkpoint intervals of 10 seconds and 20 seconds, respectively
FADI: a fault-tolerant environment for open distributed computing
FADI is a complete programming environment that serves the reliable execution of distributed application programs. FADI encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel selective message logging technique delivers an efficient, low-overhead backup and recovery mechanism for distributed processes. FADI also provides means for remote automatic process allocation on the distributed system nodes
Out-Of-Place debugging: a debugging architecture to reduce debugging interference
Context. Recent studies show that developers spend most of their programming
time testing, verifying and debugging software. As applications become more and
more complex, developers demand more advanced debugging support to ease the
software development process.
Inquiry. Since the 70's many debugging solutions were introduced. Amongst
them, online debuggers provide a good insight on the conditions that led to a
bug, allowing inspection and interaction with the variables of the program.
However, most of the online debugging solutions introduce \textit{debugging
interference} to the execution of the program, i.e. pauses, latency, and
evaluation of code containing side-effects.
Approach. This paper investigates a novel debugging technique called
\outofplace debugging. The goal is to minimize the debugging interference
characteristic of online debugging while allowing online remote capabilities.
An \outofplace debugger transfers the program execution and application state
from the debugged application to the debugger application, both running in
different processes.
Knowledge. On the one hand, \outofplace debugging allows developers to debug
applications remotely, overcoming the need of physical access to the machine
where the debugged application is running. On the other hand, debugging happens
locally on the remote machine avoiding latency. That makes it suitable to be
deployed on a distributed system and handle the debugging of several processes
running in parallel.
Grounding. We implemented a concrete out-of-place debugger for the Pharo
Smalltalk programming language. We show that our approach is practical by
performing several benchmarks, comparing our approach with a classic remote
online debugger. We show that our prototype debugger outperforms by a 1000
times a traditional remote debugger in several scenarios. Moreover, we show
that the presence of our debugger does not impact the overall performance of an
application.
Importance. This work combines remote debugging with the debugging experience
of a local online debugger. Out-of-place debugging is the first online
debugging technique that can minimize debugging interference while debugging a
remote application. Yet, it still keeps the benefits of online debugging ( e.g.
step-by-step execution). This makes the technique suitable for modern
applications which are increasingly parallel, distributed and reactive to
streams of data from various sources like sensors, UI, network, etc
Recommended from our members
Record and Transplay: Partial Checkpointing for Replay Debugging
Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application's replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead
Preliminary Design of the APIARY for VLSI Support of Knowledge-Based Systems
This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Office of Naval Research of the Department of Defense under Contract N00014-75-C-0522.Knowledge-based applications will require vastly increased computational resources to achieve their goals. We are working on the development of a VLSI Message Passing Architecture to meet this need. As a first step we present the preliminary design of the APIARY system in this paper. The APIARY is currently in an early stage of implementation at the MIT Artificial Intelligence Laboratory.MIT Artificial Intelligence Laboratory
Department of Defense Office of Naval Researc
- …