Search CORE

299 research outputs found

CIC : an integrated approach to checkpointing in mobile agent systems

Author: Cao J
Wu W
Yang J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/12/2014
Field of study

Internet and Mobile Computing Lab (in Department of Computing)Refereed conference paper2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

PolyU Institutional Repository

An Optimizing Java Translation Framework for Automated Checkpointing and Strong Mobility

Author: Saini Arvind Kumar
Publication venue: LSU Digital Commons
Publication date: 21/01/2018
Field of study

Long-running programs, e.g., in high-performance computing, need to write periodic checkpoints of their execution state to disk to allow them to recover from node failure. Manually adding checkpointing code to an application, however, is very tedious. The mechanisms needed for writing the execution state of a program to disk and restoring it are similar to those needed for migrating a running thread or a mobile object. We have extended a source-to-source translation scheme that allows the migration of mobile Java objects with running threads to make it more general and allow it to be used for automated checkpointing. Our translation scheme allows serializable threads to be written to disk or migrated with a mobile agent to a remote machine. The translator generates code that maintains a serializable run-time stack for each thread as a Java data structure. While this results in significant run-time overhead, it allows the checkpointing code to be generated automatically. We improved the locking mechanism that is needed to protect the run-time stack as well as the translation scheme. Our experimental results demonstrate an speedup of the generated code over the original translator and show that the approach is feasible in practice

Louisiana State University

Checkpoint placement algorithms for mobile agent system

Author: Cao J
Wu W
Yang J
Publication venue: IEEE Computer Society
Publication date: 11/12/2014
Field of study

2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

CiteSeerX

PolyU Institutional Repository

Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

Author: D'Angelo Gabriele
Ferretti Stefano
Marzolla Moreno
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Efficient Parallel Application Execution on Opportunistic Desktop Grids

Author: Alfredo Goldman
Daniel Batista
Fabio Costa
Fabio Kon
Francisco Silva
Raphael Camargo
Publication venue: 'IntechOpen'
Publication date: 16/05/2012
Field of study

IntechOpen

An Analysis of Failure Handling in Chameleon, A Framework for Supporting Cost-Effective Fault Tolerant Services

Author: Haakensen Erik Edward
Publication venue
Publication date
Field of study

The desire for low-cost reliable computing is increasing. Most current fault tolerant computing solutions are not very flexible, i.e., they cannot adapt to reliability requirements of newly emerging applications in business, commerce, and manufacturing. It is important that users have a flexible, reliable platform to support both critical and noncritical applications. Chameleon, under development at the Center for Reliable and High-Performance Computing at the University of Illinois, is a software framework. for supporting cost-effective adaptable networked fault tolerant service. This thesis details a simulation of fault injection, detection, and recovery in Chameleon. The simulation was written in C++ using the DEPEND simulation library. The results obtained from the simulation included the amount of overhead incurred by the fault detection and recovery mechanisms supported by Chameleon. In addition, information about fault scenarios from which Chameleon cannot recover was gained. The results of the simulation showed that both critical and noncritical applications can be executed in the Chameleon environment with a fairly small amount of overhead. No single point of failure from which Chameleon could not recover was found. Chameleon was also found to be capable of recovering from several multiple failure scenarios

NASA Technical Reports Server

Proxy Module for System on Mobile Devices (SyD) Middleware

Author: Gunawan Joseph
Publication venue: ScholarWorks @ Georgia State University
Publication date: 20/11/2008
Field of study

Nowadays, users of mobile devices are growing. The users expect that they could communicate constantly using their mobile devices while they are also constantly moving. Therefore, there is a need to provide disconnection tolerance of transactions in the mobile devices’ platforms and its synchronization management. System on Mobile Devices (SyD) is taken as one of the examples of mobile devices’ platforms. The thesis studies the existing SyD architecture, from its framework into its kernel, and introduces the proxy module enhancement in SyD to handle disconnection tolerance, including its synchronization. SyD kernel has been extended for the purpose of enabling proxy module. SyDSync has been constructed for synchronization with the proxy. The timeout has been studied for seamless proxy invocation. A Camera application that tries to catch a stolen vehicle has been simulated for the practical purpose of using the proxy module extension

ScholarWorks @ Georgia State University

Checkpoint Placement Algorithms for Mobile Agent System

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Crossref