16 research outputs found

    Self-Stabilization in the Distributed Systems of Finite State Machines

    Get PDF
    The notion of self-stabilization was first proposed by Dijkstra in 1974 in his classic paper. The paper defines a system as self-stabilizing if, starting at any, possibly illegitimate, state the system can automatically adjust itself to eventually converge to a legitimate state in finite amount of time and once in a legitimate state it will remain so unless it incurs a subsequent transient fault. Dijkstra limited his attention to a ring of finite-state machines and provided its solution for self-stabilization. In the years following his introduction, very few papers were published in this area. Once his proposal was recognized as a milestone in work on fault tolerance, the notion propagated among the researchers rapidly and many researchers in the distributed systems diverted their attention to it. The investigation and use of self-stabilization as an approach to fault-tolerant behavior under a model of transient failures for distributed systems is now undergoing a renaissance. A good number of works pertaining to self-stabilization in the distributed systems were proposed in the yesteryears most of which are very recent. This report surveys all previous works available in the literature of self-stabilizing systems

    Self-stabilizing virtual synchrony

    Get PDF
    Virtual synchrony (VS) is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is critical for the success of such implementations since large distributed systems can be highly available as long as they do not depend on the full operational status of every system participant. Self-stabilizing systems can tolerate transient faults that drive the system to an arbitrary unpredictable configuration. Such systems automatically regain consistency from any such configuration, and then produce the desired system behavior ensuring it for practically infinite number of successive steps, e.g., 264 steps. We present a new multi-purpose self-stabilizing counter algorithm establishing an efficient practically unbounded counter, that can directly yield a self-stabilizing Multiple-Writer Multiple-Reader (MWMR) register emulation. We use our counter algorithm, together with a selfstabilizing group membership and a self-stabilizing multicast service to devise the first practically stabilizing VS algorithm and a self-stabilizing VS-based emulation of state machine replication (SMR). As we base the SMR implementation on VS, rather than consensus, the system progresses in more extreme asynchronous settings in relation to consensusbased SMR

    Acheminement de messages instantanément stabilisant pour arbres couvrants

    Get PDF
    International audienceNous présentons un protocole instantanément stabilisant d'acheminement de messages au sein de structures couvrantes arborescentes de réseaux. Notre protocole utilise l'information fournie par un algorithme de calcul de tables de routage auto-stabilisant s'appuyant sur cette structure. Le fait que le protocole soit instantanément stabilisant signifie que tout message émis après les fautes est acheminé à son destinataire, y compris lorsque les tables de routage ne sont pas stabilisées. Notre algorithme présente l'avantage que le nombre de tampons est indépendant de tout paramètre global du réseau comme le nombre de noeuds ou le diamètre. En effet, nous montrons que le problème peut être résolu en utilisant un nombre constant de tampons par lien de communication de la structure couvrante. Cette propriété lui confère l'avantage de tolérer le passage à l'échelle

    Bounded Protocols for Efficient Reliable Message Transmission

    Get PDF
    In the reliable message transmission problem (RMTP) processors communicate by exchanging messages, but the channel that connects two processors is subject to message loss, duplication, and reordering. Previous work focused on proposing protocols in asynchronous systems, where message size is finite and sequence numbers are bounded. However, if the channel can duplicate messages, lose messages, and arbitrarily reorder the messages, the problem is unsolvable. In this thesis, we consider a strengthening of the asynchronous model in which reordering of messages is bounded. In this model, we develop two efficient protocols to solve the RMTP: (1) when messages may be duplicated but not lost and (2) when messages may be duplicated and lost. This result is in contrast to the impossibility of such an algorithm when reordering is unbounded. Our protocols have the pleasing property that no messages need to be sent from the receiver to the sender

    Self-stabilizing sorting algorithms

    Full text link
    A distributed system consists of a set of machines which do not share a global memory. Depending on the connectivity of the network, each machine gets a partial view of the global state. Transient failures in one area of the network may go unnoticed in other areas and may cause the system to go to an illegal global state. However, if the system were self-stabilizing, it would be guaranteed that regardless of the current state, the system would recover to a legal configuration in a finite number of moves; The traditional way of creating reliable systems is to make redundant components. Self-stabilization allows systems to be fault tolerant through software as well. This is an evolving paradigm in the design of robust distributed systems. The ability to recover spontaneously from an arbitrary state makes self-stabilizing systems immune to transient failures or perturbations in the system state such as changes in network topology; This thesis presents an O(nh) fault-tolerant distributed sorting algorithm for a tree network, where n is the number of nodes in the system, and h is the height of the tree. Fault-tolerance is achieved using Dijkstra\u27s paradigm of self-stabilization which is a method of non-masking fault-tolerance embedding the fault-tolerance within the algorithm. Varghese\u27s counter flushing method is used in order to achieve synchronization among processes in the system. In the distributed sorting problem each node is given a value and an id which are non-corruptible. The idea is to have each node take a specific value based on its id. The algorithm handles transient faults by weeding out false information in the system. Nodes can start with completely false information concerning the values and ids of the system yet the intended behavior is still achieved. Also, nodes are allowed to crash and re-enter the system later as well as allowing new nodes to enter the system