Search CORE

3,049 research outputs found

A New Concurrent Checkpoint Mechanism for Embeded Multi-Core Systems

Author: Liao Jianwei
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/08/2012
Field of study

his paper presents a new transparent, incremental, concurrent checkpoint mechanism for embedded multi-core systems. It allows the checkpointed process (also called checkpointee) to continue running without stopping while checkpoints are set to a large extent. Through tracing TLB misses to block the accesses to target memory pages first time while dumping memory pages (the most time-consuming step when setting a checkpoint). At that time, a kernel thread, called checkpointer, copies the memory access target pages to the designated memory buffer for constructing a consistent state of the checkpointee, and then resumes the memory accesses. From the experimental results, in contrast to a traditional concurrent checkpoint system, the proposed mechanism reduces the downtime of the checkpointed process by more than 10.1 %. Moreover, the incremental checkpointing functionality has been implemented in this new concurrent checkpoint mechanism as well. Compared with full checkpointing, incremental checkpointing can reduce the checkpoint time more than 95.5 % and 89.2 % while the benchmark is the matrix multiplication at the checkpoint intervals of 10 seconds and 20 seconds, respectively

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

FADI: a fault-tolerant environment for open distributed computing

Author: A. Bargiela
BARGIELA
ELNOZAHY
JANAKIRAMAN
LAMOTTE
OSMAN
OSMAN
PLANK
POWELL
RIBERIO
SENS
SILVA
T. Osman
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2000
Field of study

FADI is a complete programming environment that serves the reliable execution of distributed application programs. FADI encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel selective message logging technique delivers an efficient, low-overhead backup and recovery mechanism for distributed processes. FADI also provides means for remote automatic process allocation on the distributed system nodes

Crossref

Nottingham Trent Institutional Repository (IRep)

Out-Of-Place debugging: a debugging architecture to reduce debugging interference

Author: Boix Elisa Gonzalez
Marra Matteo
Polito Guillermo
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 05/11/2018
Field of study

Context. Recent studies show that developers spend most of their programming time testing, verifying and debugging software. As applications become more and more complex, developers demand more advanced debugging support to ease the software development process. Inquiry. Since the 70's many debugging solutions were introduced. Amongst them, online debuggers provide a good insight on the conditions that led to a bug, allowing inspection and interaction with the variables of the program. However, most of the online debugging solutions introduce \textit{debugging interference} to the execution of the program, i.e. pauses, latency, and evaluation of code containing side-effects. Approach. This paper investigates a novel debugging technique called \outofplace debugging. The goal is to minimize the debugging interference characteristic of online debugging while allowing online remote capabilities. An \outofplace debugger transfers the program execution and application state from the debugged application to the debugger application, both running in different processes. Knowledge. On the one hand, \outofplace debugging allows developers to debug applications remotely, overcoming the need of physical access to the machine where the debugged application is running. On the other hand, debugging happens locally on the remote machine avoiding latency. That makes it suitable to be deployed on a distributed system and handle the debugging of several processes running in parallel. Grounding. We implemented a concrete out-of-place debugger for the Pharo Smalltalk programming language. We show that our approach is practical by performing several benchmarks, comparing our approach with a classic remote online debugger. We show that our prototype debugger outperforms by a 1000 times a traditional remote debugger in several scenarios. Moreover, we show that the presence of our debugger does not impact the overall performance of an application. Importance. This work combines remote debugging with the debugging experience of a local online debugger. Out-of-place debugging is the first online debugging technique that can minimize debugging interference while debugging a remote application. Yet, it still keeps the benefits of online debugging ( e.g. step-by-step execution). This makes the technique suitable for modern applications which are increasingly parallel, distributed and reactive to streams of data from various sources like sensors, UI, network, etc

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Recommended from our members

Record and Transplay: Partial Checkpointing for Replay Debugging

Author: Nieh Jason
Subhraveti Dinesh Kumar
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application's replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead

Columbia University Academic Commons

Preliminary Design of the APIARY for VLSI Support of Knowledge-Based Systems

Author: Hewitt Carl
Publication venue: MIT Artificial Intelligence Laboratory
Publication date: 01/06/1979
Field of study

This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Office of Naval Research of the Department of Defense under Contract N00014-75-C-0522.Knowledge-based applications will require vastly increased computational resources to achieve their goals. We are working on the development of a VLSI Message Passing Architecture to meet this need. As a first step we present the preliminary design of the APIARY system in this paper. The APIARY is currently in an early stage of implementation at the MIT Artificial Intelligence Laboratory.MIT Artificial Intelligence Laboratory Department of Defense Office of Naval Researc

DSpace@MIT