7,386 research outputs found

    An approach to rollback recovery of collaborating mobile agents

    Get PDF
    Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    On using the CAMA framework for developing open mobile fault tolerant agent systems

    Get PDF
    The paper introduces the Cama (Context-Aware Mobile Agents) framework intended for developing large-scale mobile applications using the agent paradigm. Cama provides a powerful set of abstractions, a supporting middleware and an adaptation layer allowing developers to address the main characteristics of the mobile applications: openness, asynchronous and anonymous communication, fault tolerance, device mobility. It ensures recursive system structuring using location, scope, agent and role abstractions. Cama supports system fault tolerance through exception handling and structured agent coordination. The applicability of the framework is demonstrated using an ambient lecture scenario - the first part of an ongoing work on a series of ambient campus applications

    On developing open mobile fault tolerant agent systems

    Get PDF
    The paper introduces the CAMA (Context-Aware Mobile Agents) framework intended for developing large-scale mobile applications using the agent paradigm. CAMA provides a powerful set of abstractions, a supporting middleware and an adaptation layer allowing developers to address the main characteristics of the mobile applications: openness, asynchronous and anonymous communication, fault tolerance, and device mobility. It ensures recursive system structuring using location, scope, agent, and role abstractions. CAMA supports system fault tolerance through exception handling and structured agent coordination within nested scopes. The applicability of the framework is demonstrated using an ambient lecture scenario - the first part of an ongoing work on a series of ambient campus applications. This scenario is developed starting from a thorough definition of the traceable requirements including the fault tolerance requirements. This is followed by the design phase at which the CAMA abstractions are applied. At the implementation phase, the CAMA middleware services are used through a provided API. This work is part of the FP6 IST RODIN project on Rigorous Open Development Environment for Complex Systems

    Reliable Fault Tolerance System for Service Composition in Mobile Ad Hoc Network

    Get PDF
    A Due to the rapid development of smart processing mobile devices, Mobile applications are exploring the use of web services in MANETs to satisfy the user needs. Complex user needs are satisfied by the service composition where a complex service is created by combining one or more atomic services. Service composition has a significant challenge in MANETs due to its limited bandwidth, constrained energy sources, dynamic node movement and often suffers from node failures. These constraints increase the failure rate of service composition. To overcome these, we propose Reliable Fault Tolerant System for Service Composition in MANETs (RFTSC) which makes use of the checkpointing technique for service composition in MANETs. We propose fault policies for each fault in service composition when the faults occur. Failure of services in the service composition process is recovered locally by making use of Checkpointing system and by using discovered services which satisfies the QoS constraints. A Multi-Service Tree (MST) is proposed to recover failed services with O(1) time complexity. Simulation result shows that the proposed approach is efficient when compared to existing approaches

    An Improved Approximate Consensus Algorithm in the Presence of Mobile Faults

    Full text link
    This paper explores the problem of reaching approximate consensus in synchronous point-to-point networks, where each pair of nodes is able to communicate with each other directly and reliably. We consider the mobile Byzantine fault model proposed by Garay '94 -- in the model, an omniscient adversary can corrupt up to ff nodes in each round, and at the beginning of each round, faults may "move" in the system (i.e., different sets of nodes may become faulty in different rounds). Recent work by Bonomi et al. '16 proposed a simple iterative approximate consensus algorithm which requires at least 4f+14f+1 nodes. This paper proposes a novel technique of using "confession" (a mechanism to allow others to ignore past behavior) and a variant of reliable broadcast to improve the fault-tolerance level. In particular, we present an approximate consensus algorithm that requires only ⌈7f/2⌉+1\lceil 7f/2\rceil + 1 nodes, an ⌊f/2⌋\lfloor f/2 \rfloor improvement over the state-of-the-art algorithms. Moreover, we also show that the proposed algorithm is optimal within a family of round-based algorithms
    • 

    corecore