11 research outputs found

    Fast and Compact Distributed Verification and Self-Stabilization of a DFS Tree

    Full text link
    We present algorithms for distributed verification and silent-stabilization of a DFS(Depth First Search) spanning tree of a connected network. Computing and maintaining such a DFS tree is an important task, e.g., for constructing efficient routing schemes. Our algorithm improves upon previous work in various ways. Comparable previous work has space and time complexities of O(nlogΔ)O(n\log \Delta) bits per node and O(nD)O(nD) respectively, where Δ\Delta is the highest degree of a node, nn is the number of nodes and DD is the diameter of the network. In contrast, our algorithm has a space complexity of O(logn)O(\log n) bits per node, which is optimal for silent-stabilizing spanning trees and runs in O(n)O(n) time. In addition, our solution is modular since it utilizes the distributed verification algorithm as an independent subtask of the overall solution. It is possible to use the verification algorithm as a stand alone task or as a subtask in another algorithm. To demonstrate the simplicity of constructing efficient DFS algorithms using the modular approach, We also present a (non-sielnt) self-stabilizing DFS token circulation algorithm for general networks based on our silent-stabilizing DFS tree. The complexities of this token circulation algorithm are comparable to the known ones

    Stabilizing Inter-Domain Routing in the Internet

    Full text link
    This paper reports the first self-stabilizing Border Gateway Protocol (BGP). BGP is the standard inter-domain routing protocol in the Internet. Self-stabilization is a technique to tolerate arbitrary transient faults. The routing instability in the Internet can occur due to errors in configuring the routing data structures, the routing policies, transient physical and data link problems, software bugs, and memory corruption. This instability can increase the network latency, slow down the convergence of the routing data structures, and can also cause the partitioning of networks. Most of the previous studies concentrated on routing policies to achieve the convergence of BGP while the oscillations due to transient faults were ignored. The purpose of self-stabilizing BGP is to solve the routing instability problem when this instability results from transient failures. The selfstabilizing BGP presented here provides a way to detect and automatically recover from this type of faults. Our protocol is combined with an existing protocol to make it resilient to policy conflicts as well

    Fast and compact self-stabilizing verification, computation, and fault detection of an MST

    Get PDF
    This paper demonstrates the usefulness of distributed local verification of proofs, as a tool for the design of self-stabilizing algorithms.In particular, it introduces a somewhat generalized notion of distributed local proofs, and utilizes it for improving the time complexity significantly, while maintaining space optimality. As a result, we show that optimizing the memory size carries at most a small cost in terms of time, in the context of Minimum Spanning Tree (MST). That is, we present algorithms that are both time and space efficient for both constructing an MST and for verifying it.This involves several parts that may be considered contributions in themselves.First, we generalize the notion of local proofs, trading off the time complexity for memory efficiency. This adds a dimension to the study of distributed local proofs, which has been gaining attention recently. Specifically, we design a (self-stabilizing) proof labeling scheme which is memory optimal (i.e., O(logn)O(\log n) bits per node), and whose time complexity is O(log2n)O(\log ^2 n) in synchronous networks, or O(Δlog3n)O(\Delta \log ^3 n) time in asynchronous ones, where Δ\Delta is the maximum degree of nodes. This answers an open problem posed by Awerbuch and Varghese (FOCS 1991). We also show that Ω(logn)\Omega(\log n) time is necessary, even in synchronous networks. Another property is that if ff faults occurred, then, within the requireddetection time above, they are detected by some node in the O(flogn)O(f\log n) locality of each of the faults.Second, we show how to enhance a known transformer that makes input/output algorithms self-stabilizing. It now takes as input an efficient construction algorithm and an efficient self-stabilizing proof labeling scheme, and produces an efficient self-stabilizing algorithm. When used for MST, the transformer produces a memory optimal self-stabilizing algorithm, whose time complexity, namely, O(n)O(n), is significantly better even than that of previous algorithms. (The time complexity of previous MST algorithms that used Ω(log2n)\Omega(\log^2 n) memory bits per node was O(n2)O(n^2), and the time for optimal space algorithms was O(nE)O(n|E|).) Inherited from our proof labelling scheme, our self-stabilising MST construction algorithm also has the following two properties: (1) if faults occur after the construction ended, then they are detected by some nodes within O(log2n)O(\log ^2 n) time in synchronous networks, or within O(Δlog3n)O(\Delta \log ^3 n) time in asynchronous ones, and (2) if ff faults occurred, then, within the required detection time above, they are detected within the O(flogn)O(f\log n) locality of each of the faults. We also show how to improve the above two properties, at the expense of some increase in the memory

    Performance Evaluation of Self-stabilizing Algorithms by Probabilistic Model Checking

    Get PDF
    A self-stabilizing protocol is one that starting from any arbitrary initial state recovers to legitimate states in a finite number of steps, and once it stabilizes to a set of legitimate states, it remains there unless it is perturbed by transient faults. The traditional methods existing for performance evaluation of a self-stabilizing algorithm usually work based on the analysis of worst case computational complexity. Another method that has been commonly used in evaluating these algorithms is simulation, which assumes the system starts from an initial state. Here, it is argued that the traditional methods have shortcomings and do not give enough insight about the behavior of the system. Moreover, they do not provide a decent method of comparison. We propose a novel method for evaluation of self-stabilizing algorithms. This method works based on probabilistic model checking and computation of the expected number of recovery steps. We execute some experiments on the case studies, and the results indicate that we can gain insight about the faults and their structure in the protocol. Next, we explain the difficulty of designing a self-stabilizing algorithm for a system and show how it is impossible to do so for some classes of protocols. This resulted in some relaxation in the definition of self-stabilization. One of the relaxations made in the definition of self-stabilization is weak-stabilization. A weak-stabilizing protocol ensures the existence of a recovery path from an arbitrary initial configuration. Thus, some paths may contain connected components or cycles. Since a weak-stabilizing algorithm may get stuck in connected components forever, we cannot evaluate weak-stabilizing protocols by traditional and existing methods. We calculate the expected number of recovery steps for evaluating weak-stabilization. However, since it does not give us enough intuition about the structure of faults, we apply a graph-theoretic formula for estimating the weak-stabilizing algorithm's performance. This formula is based on the number of cycles and their reachability. Based on the observations we made by performance evaluation of these protocols, we suggest algorithms called state encoding for modifying the performance of the algorithms. State encoding works based on changing the bit mapping of the states of the system. The aim is to make the states with faster recovery steps more probable to occur. There are three algorithms, one of which works based on betweenness centrality which is a measure of centrality of a node within a graph. The other one works based on feedback arc set which is a set of arcs whose removal makes a graph acyclic. The third algorithm works based on the length of the shortest recovery path for the states. The other problem investigated here is the problem of state space explosion in model checking. Similar to traditional methods of model checking, probabilistic model checking also suffers from the problem of state space explosion, i.e., the number of states grows exponentially in terms of the number of components in the distributed system. Abstraction methods, which are described briefly here, are designed to combat this problem. We argue that they are not effcient enough, and there is still the lack of a suffcient abstraction method that works for systems with an arbitrary number of processes. We also propose a new approach for evaluation of an abstraction function. Then, based on the intuition gained, a new abstraction algorithm is proposed that is exclusively designed for verification of reachability properties. After executing experiments on a case study, we compare the result of our algorithm with the results obtained by existing methods. The results support our claim that our method is more effcient and precise

    Etude de la fiabilité des algorithmes self-convergeants face aux soft-erreurs

    Get PDF
    This thesis is devoted to the study of the robustness/sensitivity of a self-converging algorithm with respect to SEU's. These phenomenon also called bit-flips which may modify the content of memory elements as the result of the silicon ionization resulting from the impact of a charged particles. This study may have a significant impact given the conditions of miniaturization that will soon have circuits with hundreds to thousands of processing cores on a single chip, this will require make the cores communicate effectively and robust manner. In this context the so-called self-converging algorithm can be used to ensure that communication between cores is reliable and without external intervention. A fault injection study of the robustness of the algorithm was performed, this algorithm was initially executed by a processor LEON3 implemented in the FPGA embedded in a specific platform test. Preliminary fault injection from a method the state of the art called CEU showed some sensitivity to SEUs of algorithm. To cope with the software changes were made and techniques for fault tolerance have been implemented in software in the program implementing the self-converging algorithm. The fault injection experiments were made to demonstrate the robustness to SEU's and potential problems of the modified algorithm. The impact of SEUs was explored on a hardware-implemented self-converging algorithm in a FPGA. The evaluation of this method was performed by fault injection at RTL level circuit. These results obtained with this method have shown a significant improvement of the robustness of the algorithm in comparison with its software version.Cette thèse est consacrée à l'étude de la robustesse/sensibilité d'un algorithme auto-convergeant face aux SEU's. Ces phénomènes appelés aussi bit-flips qui se traduit par le basculement intempestif du contenu d'un élément mémoire comme conséquence de l'ionisation produite par le passage d'une particule chargée avec le matériel. Cette étude pourra avoir un impact important vu la conjoncture de miniaturisation qui permettra bientôt de disposer de circuits avec des centaines à des milliers de cœurs de traitement sur une seule puce, pour cela il faudra faire les cœurs communiquer de manière efficace et robustes. Dans ce contexte les algorithme dits auto-convergeants peuvent être utilis afin que la communication entre les cœurs soit fiable et sans intervention extérieure. Une étude par injection de fautes de la robustesse de l'algorithme étudié a été effectuée, cet algorithme a été initialement exécuté par un processeur LEON3 implémenté dans un FPGA embarqué dans une plateforme de test spécifique. Les campagnes préliminaires d'injection de fautes issus d'une méthode de l'état de l'art appelée CEU (Code Emulated Upset) ont mis en évidence une certaine sensibilité aux SEUs de l'algorithme. Pour y faire face des modifications du logiciel ont été effectuées et des techniques de tolérance aux fautes ont été implémentés au niveau logiciel dans le programme implémentant l'algorithme. Des expériences d'injection de fautes ont été effectués pour mettre en évidence la robustesse face aux SEUs et ses potentiels « Tallons d'Achille » de l'algorithme modifié. L'impact des SEUs a été aussi exploré sur l'algorithme auto-convergeant implémenté dans une version hardware dans un FPGA. L'évaluation de cette méthodologie a été effectuée par des expériences d'injection de fautes au niveau RTL du circuit. Ces résultats obtenus avec cette méthode ont montré une amélioration significative de la robustesse de l'algorithme en comparaison avec sa version logicielle

    The fault span of crash failures

    No full text

    The Fault Span of Crash Failures

    No full text
    A crashing network protocol is an asynchronous protocol whose memory does not survive crashes. We show that a crashing network protocol that works over unreliable links can be driven to arbitrary global states, where each node is in a state reached in some (possibly different) run, and each link has an arbitrary mixture of packets sent in (possibly different) runs. Our theorem considerably generalizes an earlier result, due to Fekete et al, which states that there is no correct crashing Data Link Protocol. For example, we prove that there is no correct crashing protocols for token passing and for many other resource allocation protocols such as k-exclusion, and the drinking and dining philosophers problems. We further characterize the reachable states caused by crash failures using reliable non-FIFO and reliable FIFO links. We show that with reliable non-FIFO links any acyclic subset of nodes and links can be driven to arbitrary states. We show that with reliable FIFO links, only nodes..
    corecore