139 research outputs found
CIC : an integrated approach to checkpointing in mobile agent systems
Internet and Mobile Computing Lab (in Department of Computing)Refereed conference paper2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
The Mojave Compiler: Providing Language Primitives for Whole-Process Migration and Speculation for Distributed Applications
We present an approach for implementing language-level primitives for whole-process migration and speculative execution in a compiler and associated runtime environment. These primitives are exposed to the user through simple language constructs that do not require the user to manage process state explicitly. With migration and speculation we show how the user can quickly add persistent checkpoints to any large-scale distributed application that requires longevity in a faulty environment. We demonstrate the use of migration and speculation primitives for checkpointing in a canonical grid computation application, and analyze the results of this implementation
Checkpoint placement algorithms for mobile agent system
2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
Background: Large-scale biological jobs on high-performance computing systems
require manual intervention if one or more computing cores on which they
execute fail. This places not only a cost on the maintenance of the job, but
also a cost on the time taken for reinstating the job and the risk of losing
data and execution accomplished by the job before it failed. Approaches which
can proactively detect computing core failures and take action to relocate the
computing core's job onto reliable cores can make a significant step towards
automating fault tolerance.
Method: This paper describes an experimental investigation into the use of
multi-agent approaches for fault tolerance. Two approaches are studied, the
first at the job level and the second at the core level. The approaches are
investigated for single core failure scenarios that can occur in the execution
of parallel reduction algorithms on computer clusters. A third approach is
proposed that incorporates multi-agent technology both at the job and core
level. Experiments are pursued in the context of genome searching, a popular
computational biology application.
Result: The key conclusion is that the approaches proposed are feasible for
automating fault tolerance in high-performance computing systems with minimal
human intervention. In a typical experiment in which the fault tolerance is
studied, centralised and decentralised checkpointing approaches on an average
add 90% to the actual time for executing the job. On the other hand, in the
same experiment the multi-agent approaches add only 10% to the overall
execution time.Comment: Computers in Biology and Medicin
Design and analysis of an efficient energy algorithm in wireless social sensor networks
Because mobile ad hoc networks have characteristics such as lack of center nodes, multi-hop routing and changeable topology, the existing checkpoint technologies for normal mobile networks cannot be applied well to mobile ad hoc networks. Considering the multi-frequency hierarchy structure of ad hoc networks, this paper proposes a hybrid checkpointing strategy which combines the techniques of synchronous checkpointing with asynchronous checkpointing, namely the checkpoints of mobile terminals in the same cluster remain synchronous, and the checkpoints in different clusters remain asynchronous. This strategy could not only avoid cascading rollback among the processes in the same cluster, but also avoid too many message transmissions among the processes in different clusters. What is more, it can reduce the communication delay. In order to assure the consistency of the global states, this paper discusses the correctness criteria of hybrid checkpointing, which includes the criteria of checkpoint taking, rollback recovery and indelibility. Based on the designed Intra-Cluster Checkpoint Dependence Graph and Inter-Cluster Checkpoint Dependence Graph, the elimination rules for different kinds of checkpoints are discussed, and the algorithms for the same cluster checkpoints, different cluster checkpoints, and rollback recovery are also given. Experimental results demonstrate the proposed hybrid checkpointing strategy is a preferable trade-off method, which not only synthetically takes all kinds of resource constraints of Ad hoc networks into account, but also outperforms the existing schemes in terms of the dependence to cluster heads, the recovery time compared to the pure synchronous, and the pure asynchronous checkpoint advantage. © 2017 by the authors. Licensee MDPI, Basel, Switzerland
Proxy Module for System on Mobile Devices (SyD) Middleware
Nowadays, users of mobile devices are growing. The users expect that they could communicate constantly using their mobile devices while they are also constantly moving. Therefore, there is a need to provide disconnection tolerance of transactions in the mobile devices’ platforms and its synchronization management. System on Mobile Devices (SyD) is taken as one of the examples of mobile devices’ platforms. The thesis studies the existing SyD architecture, from its framework into its kernel, and introduces the proxy module enhancement in SyD to handle disconnection tolerance, including its synchronization. SyD kernel has been extended for the purpose of enabling proxy module. SyDSync has been constructed for synchronization with the proxy. The timeout has been studied for seamless proxy invocation. A Camera application that tries to catch a stolen vehicle has been simulated for the practical purpose of using the proxy module extension
Survey of Autonomic Computing and Experiments on JMX-based Autonomic Features
Autonomic Computing (AC) aims at solving the problem of managing the rapidly-growing complexity of Information Technology systems, by creating self-managing systems. In this thesis, we have surveyed the progress of the AC field, and studied the requirements, models and architectures of AC. The commonly recognized AC requirements are four properties - self-configuring, self-healing, self-optimizing, and self-protecting. The recommended software architecture is the MAPE-K model containing four modules, namely - monitor, analyze, plan and execute, as well as the knowledge repository.
In the modern software marketplace, Java Management Extensions (JMX) has facilitated one function of the AC requirements - monitoring. Using JMX, we implemented a package that attempts to assist programming for AC features including socket management, logging, and recovery of distributed computation. In the experiments, we have not only realized the powerful Java capabilities that are unknown to many educators, we also illustrated the feasibility of learning AC in senior computer science courses
- …