139 research outputs found

    CIC : an integrated approach to checkpointing in mobile agent systems

    Get PDF
    Internet and Mobile Computing Lab (in Department of Computing)Refereed conference paper2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    The Mojave Compiler: Providing Language Primitives for Whole-Process Migration and Speculation for Distributed Applications

    Get PDF
    We present an approach for implementing language-level primitives for whole-process migration and speculative execution in a compiler and associated runtime environment. These primitives are exposed to the user through simple language constructs that do not require the user to manage process state explicitly. With migration and speculation we show how the user can quickly add persistent checkpoints to any large-scale distributed application that requires longevity in a faulty environment. We demonstrate the use of migration and speculation primitives for checkpointing in a canonical grid computation application, and analyze the results of this implementation

    Checkpoint placement algorithms for mobile agent system

    Get PDF
    2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

    Get PDF
    Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time.Comment: Computers in Biology and Medicin

    Design and analysis of an efficient energy algorithm in wireless social sensor networks

    Get PDF
    Because mobile ad hoc networks have characteristics such as lack of center nodes, multi-hop routing and changeable topology, the existing checkpoint technologies for normal mobile networks cannot be applied well to mobile ad hoc networks. Considering the multi-frequency hierarchy structure of ad hoc networks, this paper proposes a hybrid checkpointing strategy which combines the techniques of synchronous checkpointing with asynchronous checkpointing, namely the checkpoints of mobile terminals in the same cluster remain synchronous, and the checkpoints in different clusters remain asynchronous. This strategy could not only avoid cascading rollback among the processes in the same cluster, but also avoid too many message transmissions among the processes in different clusters. What is more, it can reduce the communication delay. In order to assure the consistency of the global states, this paper discusses the correctness criteria of hybrid checkpointing, which includes the criteria of checkpoint taking, rollback recovery and indelibility. Based on the designed Intra-Cluster Checkpoint Dependence Graph and Inter-Cluster Checkpoint Dependence Graph, the elimination rules for different kinds of checkpoints are discussed, and the algorithms for the same cluster checkpoints, different cluster checkpoints, and rollback recovery are also given. Experimental results demonstrate the proposed hybrid checkpointing strategy is a preferable trade-off method, which not only synthetically takes all kinds of resource constraints of Ad hoc networks into account, but also outperforms the existing schemes in terms of the dependence to cluster heads, the recovery time compared to the pure synchronous, and the pure asynchronous checkpoint advantage. © 2017 by the authors. Licensee MDPI, Basel, Switzerland

    Proxy Module for System on Mobile Devices (SyD) Middleware

    Get PDF
    Nowadays, users of mobile devices are growing. The users expect that they could communicate constantly using their mobile devices while they are also constantly moving. Therefore, there is a need to provide disconnection tolerance of transactions in the mobile devices’ platforms and its synchronization management. System on Mobile Devices (SyD) is taken as one of the examples of mobile devices’ platforms. The thesis studies the existing SyD architecture, from its framework into its kernel, and introduces the proxy module enhancement in SyD to handle disconnection tolerance, including its synchronization. SyD kernel has been extended for the purpose of enabling proxy module. SyDSync has been constructed for synchronization with the proxy. The timeout has been studied for seamless proxy invocation. A Camera application that tries to catch a stolen vehicle has been simulated for the practical purpose of using the proxy module extension

    Checkpoint Placement Algorithms for Mobile Agent System

    Full text link

    Adaptive recovery for mobile environments

    Full text link

    Survey of Autonomic Computing and Experiments on JMX-based Autonomic Features

    Get PDF
    Autonomic Computing (AC) aims at solving the problem of managing the rapidly-growing complexity of Information Technology systems, by creating self-managing systems. In this thesis, we have surveyed the progress of the AC field, and studied the requirements, models and architectures of AC. The commonly recognized AC requirements are four properties - self-configuring, self-healing, self-optimizing, and self-protecting. The recommended software architecture is the MAPE-K model containing four modules, namely - monitor, analyze, plan and execute, as well as the knowledge repository. In the modern software marketplace, Java Management Extensions (JMX) has facilitated one function of the AC requirements - monitoring. Using JMX, we implemented a package that attempts to assist programming for AC features including socket management, logging, and recovery of distributed computation. In the experiments, we have not only realized the powerful Java capabilities that are unknown to many educators, we also illustrated the feasibility of learning AC in senior computer science courses
    corecore