12,487 research outputs found
States in Process Calculi
Formal reasoning about distributed algorithms (like Consensus) typically
requires to analyze global states in a traditional state-based style. This is
in contrast to the traditional action-based reasoning of process calculi.
Nevertheless, we use domain-specific variants of the latter, as they are
convenient modeling languages in which the local code of processes can be
programmed explicitly, with the local state information usually managed via
parameter lists of process constants. However, domain-specific process calculi
are often equipped with (unlabeled) reduction semantics, building upon a rich
and convenient notion of structural congruence. Unfortunately, the price for
this convenience is that the analysis is cumbersome: the set of reachable
states is modulo structural congruence, and the processes' state information is
very hard to identify. We extract from congruence classes of reachable states
individual state-informative representatives that we supply with a proper
formal semantics. As a result, we can now freely switch between the process
calculus terms and their representatives, and we can use the stateful
representatives to perform assertional reasoning on process calculus models.Comment: In Proceedings EXPRESS/SOS 2014, arXiv:1408.127
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive
embedded software developed for the proposed Fermilab BTeV high energy physics
experiment. Lightweight self-optimizing agents embedded within Level 1 of the
prototype are responsible for proactive and reactive monitoring and mitigation
based on specified layers of competence. The agents are self-protecting,
detecting cascading failures using a distributed approach. Adaptive,
reconfigurable, and mobile objects for reliablility are designed to be
self-configuring to adapt automatically to dynamically changing environments.
These objects provide a self-healing layer with the ability to discover,
diagnose, and react to discontinuities in real-time processing. A generic
modeling environment was developed to facilitate design and implementation of
hardware resource specifications, application data flow, and failure mitigation
strategies. Level 1 of the planned BTeV trigger system alone will consist of
2500 DSPs, so the number of components and intractable fault scenarios involved
make it impossible to design an `expert system' that applies traditional
centralized mitigative strategies based on rules capturing every possible
system state. Instead, a distributed reactive approach is implemented using the
tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th
Annual IEEE International Conference and Workshop on the Engineering of
Computer Based Systems (ECBS), Washington, DC, April, 200
Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication
This paper presents FT-GAIA, a software-based fault-tolerant parallel and
distributed simulation middleware. FT-GAIA has being designed to reliably
handle Parallel And Distributed Simulation (PADS) models, which are needed to
properly simulate and analyze complex systems arising in any kind of scientific
or engineering field. PADS takes advantage of multiple execution units run in
multicore processors, cluster of workstations or HPC systems. However, large
computing systems, such as HPC systems that include hundreds of thousands of
computing nodes, have to handle frequent failures of some components. To cope
with this issue, FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some
protection against Byzantine failures, since interaction messages among the
simulated entities are replicated as well, so that the receiving entity can
identify and discard corrupted messages. Results from an analytical model and
from an experimental evaluation show that FT-GAIA provides a high degree of
fault tolerance, at the cost of a moderate increase in the computational load
of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731
Using mobility and exception handling to achieve mobile agents that survive server crash failures
Mobile agent technology, when designed and used effectively, can minimize bandwidth consumption and autonomously provide a snapshot of the current context of a distributed system. Protecting mobile agents from server crashes is a challenging issue, since developers normally have no control over remote servers. Server crash failures can leave replicas, instable storage, unavailable for an unknown time period. Furthermore, few systems have considered the need for using a fault tolerant protocol among a group of collaborating mobile agents. This thesis uses exception handling to protect mobile agents from server crash failures. An exception model is proposed for mobile agents and two exception handler designs are investigated. The first exists at the server that created the mobile agent and uses a timeout mechanism. The second, the mobile shadow scheme, migrates with the mobile agent and operates at the previous server visited by the mobile agent. A case study application has been developed to compare the performance of the two exception handler designs. Performance results demonstrate that although the second design is slower it offers the smaller trip time when handling a server crash. Furthermore, no modification of the server environment is necessary. This thesis shows that the mobile shadow exception handling scheme reduces complexity for a group of mobile agents to survive server crashes. The scheme deploys a replica that monitors the server occupied by the master, at each stage of the itinerary. The replica exists at the previous server visited in the itinerary. Consequently, each group member is a single fault tolerant entity with respect to server crash failures. Other schemes introduce greater complexity and performance overheads since, for each stage of the itinerary, a group of replicas is sent to servers that offer an equivalent service. In addition, future research is established for fault tolerance in groups of collaborating mobile agents
- …