5,501 research outputs found

    An approach to rollback recovery of collaborating mobile agents

    Get PDF
    Fault-tolerance is one of the main problems that must be resolved to improve the adoption of the agents' computing paradigm. In this paper, we analyse the execution model of agent platforms and the significance of the faults affecting their constituent components on the reliable execution of agent-based applications, in order to develop a pragmatic framework for agent systems fault-tolerance. The developed framework deploys a communication-pairs independent check pointing strategy to offer a low-cost, application-transparent model for reliable agent- based computing that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly-one execution property

    Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems

    Get PDF
    This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an `expert system' that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems (ECBS), Washington, DC, April, 200

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    States in Process Calculi

    Full text link
    Formal reasoning about distributed algorithms (like Consensus) typically requires to analyze global states in a traditional state-based style. This is in contrast to the traditional action-based reasoning of process calculi. Nevertheless, we use domain-specific variants of the latter, as they are convenient modeling languages in which the local code of processes can be programmed explicitly, with the local state information usually managed via parameter lists of process constants. However, domain-specific process calculi are often equipped with (unlabeled) reduction semantics, building upon a rich and convenient notion of structural congruence. Unfortunately, the price for this convenience is that the analysis is cumbersome: the set of reachable states is modulo structural congruence, and the processes' state information is very hard to identify. We extract from congruence classes of reachable states individual state-informative representatives that we supply with a proper formal semantics. As a result, we can now freely switch between the process calculus terms and their representatives, and we can use the stateful representatives to perform assertional reasoning on process calculus models.Comment: In Proceedings EXPRESS/SOS 2014, arXiv:1408.127

    Multi-agent systems for power engineering applications - part 1 : Concepts, approaches and technical challenges

    Get PDF
    This is the first part of a 2-part paper that has arisen from the work of the IEEE Power Engineering Society's Multi-Agent Systems (MAS) Working Group. Part 1 of the paper examines the potential value of MAS technology to the power industry. In terms of contribution, it describes fundamental concepts and approaches within the field of multi-agent systems that are appropriate to power engineering applications. As well as presenting a comprehensive review of the meaningful power engineering applications for which MAS are being investigated, it also defines the technical issues which must be addressed in order to accelerate and facilitate the uptake of the technology within the power and energy sector. Part 2 of the paper explores the decisions inherent in engineering multi-agent systems for applications in the power and energy sector and offers guidance and recommendations on how MAS can be designed and implemented
    • 

    corecore