Search CORE

5 research outputs found

Automated Analysis and Optimization of Distributed Self-Stabilizing Algorithms

Author: Aflaki Saba
Publication venue: 'University of Waterloo'
Publication date: 17/08/2015
Field of study

Self-stabilization [2] is a versatile technique for recovery from erroneous behavior due to transient faults or wrong initialization. A system is self-stabilizing if (1) starting from an arbitrary initial state it can automatically reach a set of legitimate states in a finite number of steps and (2) it remains in legitimate states in the absence of faults. Weak-stabilization [3] and probabilistic-stabilization [4] were later introduced in the literature to deal with resource consumption of self-stabilizing algorithms and impossibility results. Since the system perturbed by fault may deviate from correct behavior for a finite amount of time, it is paramount to minimize this time as much as possible, especially in the domain of robotics and networking. This type of fault tolerance is called non-masking because the faulty behavior is not completely masked from the user [1]. Designing correct stabilizing algorithms can be tedious. Designing such algorithms that satisfy certain average recovery time constraints (e.g., for performance guarantees) adds further complications to this process. Therefore, developing an automatic technique that takes as input the specification of the desired system, and synthesizes as output a stabilizing algorithm with minimum (or other upper bound) average recovery time is useful and challenging. In this thesis, our main focus is on designing automated techniques to optimize the average recovery time of stabilizing systems using model checking and synthesis techniques. First, we prove that synthesizing weak-stabilizing distributed programs from scratch and repairing stabilizing algorithms with average recovery time constraints are NP-complete in the state-space of the program. To cope with this complexity, we propose a polynomial-time heuristic that compared to existing stabilizing algorithms, provides lower average recovery time for many of our case studies. Second, we study the problem of fine tuning of probabilistic-stabilizing systems to improve their performance. We take advantage of the two properties of self-stabilizing algorithms to model them as absorbing discrete-time Markov chains. This will reduce the computation of average recovery time to finding the weighted sum of elements in the inverse of a matrix. Finally, we study the impact of scheduling policies on recovery time of stabilizing systems. We, in particular, propose a method to augment self-stabilizing programs with k-central and k-bounded schedulers to study dierent factors, such as geographical distance of processes and the achievable level of parallelism

University of Waterloo's Institutional Repository

Performance Evaluation of Self-stabilizing Algorithms by Probabilistic Model Checking

Author: Fallahi Narges
Publication venue: 'University of Waterloo'
Publication date: 01/01/2014
Field of study

A self-stabilizing protocol is one that starting from any arbitrary initial state recovers to legitimate states in a finite number of steps, and once it stabilizes to a set of legitimate states, it remains there unless it is perturbed by transient faults. The traditional methods existing for performance evaluation of a self-stabilizing algorithm usually work based on the analysis of worst case computational complexity. Another method that has been commonly used in evaluating these algorithms is simulation, which assumes the system starts from an initial state. Here, it is argued that the traditional methods have shortcomings and do not give enough insight about the behavior of the system. Moreover, they do not provide a decent method of comparison. We propose a novel method for evaluation of self-stabilizing algorithms. This method works based on probabilistic model checking and computation of the expected number of recovery steps. We execute some experiments on the case studies, and the results indicate that we can gain insight about the faults and their structure in the protocol. Next, we explain the difficulty of designing a self-stabilizing algorithm for a system and show how it is impossible to do so for some classes of protocols. This resulted in some relaxation in the definition of self-stabilization. One of the relaxations made in the definition of self-stabilization is weak-stabilization. A weak-stabilizing protocol ensures the existence of a recovery path from an arbitrary initial configuration. Thus, some paths may contain connected components or cycles. Since a weak-stabilizing algorithm may get stuck in connected components forever, we cannot evaluate weak-stabilizing protocols by traditional and existing methods. We calculate the expected number of recovery steps for evaluating weak-stabilization. However, since it does not give us enough intuition about the structure of faults, we apply a graph-theoretic formula for estimating the weak-stabilizing algorithm's performance. This formula is based on the number of cycles and their reachability. Based on the observations we made by performance evaluation of these protocols, we suggest algorithms called state encoding for modifying the performance of the algorithms. State encoding works based on changing the bit mapping of the states of the system. The aim is to make the states with faster recovery steps more probable to occur. There are three algorithms, one of which works based on betweenness centrality which is a measure of centrality of a node within a graph. The other one works based on feedback arc set which is a set of arcs whose removal makes a graph acyclic. The third algorithm works based on the length of the shortest recovery path for the states. The other problem investigated here is the problem of state space explosion in model checking. Similar to traditional methods of model checking, probabilistic model checking also suffers from the problem of state space explosion, i.e., the number of states grows exponentially in terms of the number of components in the distributed system. Abstraction methods, which are described briefly here, are designed to combat this problem. We argue that they are not effcient enough, and there is still the lack of a suffcient abstraction method that works for systems with an arbitrary number of processes. We also propose a new approach for evaluation of an abstraction function. Then, based on the intuition gained, a new abstraction algorithm is proposed that is exclusively designed for verification of reachability properties. After executing experiments on a case study, we compare the result of our algorithm with the results obtained by existing methods. The results support our claim that our method is more effcient and precise

University of Waterloo's Institutional Repository

Rigorous Performance Evaluation of Self-Stabilization Using Probabilistic Model Checking

Author: Bonakdarpour Borzoo
Fallahi Narges
Tixeuil Sébastien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceWe propose a new metric for effectively and accurately evaluating the performance of self-stabilizing algorithms. Self-stabilization is a versatile category of fault-tolerance that guarantees system recovery to normal behavior within a finite number of steps, when the state of the system is perturbed by transient faults (or equally, the initial state of the system can be some arbitrary state). The performance of self-stabilizing algorithms is conventionally characterized in the literature by asymptotic computation complexity. We argue that such characterization of performance is too abstract and does not reflect accurately the realities of deploying a distributed algorithm in practice. Our new metric for characterizing the performance of self-stabilizing algorithms is the expected mean value of recovery time. Our metric has several crucial features. Firstly, it encodes accurate average case speed of recovery. Secondly, we show that our evaluation method can effectively incorporate several other parameters that are of importance in practice and have no place in asymptotic computation complexity. Examples include the type of distributed scheduler, likelihood of occurrence of faults, the impact of faults on speed of recovery, and network topology. We utilize a deep analysis technique, namely, probabilistic model checking to rigorously compute our proposed metric. All our claims are backed by detailed case studies and experiments

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

On the Limits and Practice of Automatically Designing Self-Stabilization

Author: Klinkhamer Alex
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2016
Field of study

A protocol is said to be self-stabilizing when the distributed system executing it is guaranteed to recover from any fault that does not cause permanent damage. Designing such protocols is hard since they must recover from all possible states, therefore we investigate how feasible it is to synthesize them automatically. We show that synthesizing stabilization on a fixed topology is NP-complete in the number of system states. When a solution is found, we further show that verifying its correctness on a general topology (with any number of processes) is undecidable, even for very simple unidirectional rings. Despite these negative results, we develop an algorithm to synthesize a self-stabilizing protocol given its desired topology, legitimate states, and behavior. By analogy to shadow puppetry, where a puppeteer may design a complex puppet to cast a desired shadow, a protocol may need to be designed in a complex way that does not even resemble its specification. Our shadow/puppet synthesis algorithm addresses this concern and, using a complete backtracking search, has automatically designed 4 new self-stabilizing protocols with minimal process space requirements: 2-state maximal matching on bidirectional rings, 5-state token passing on unidirectional rings, 3-state token passing on bidirectional chains, and 4-state orientation on daisy chains

Michigan Technological University

Automated Synthesis of Timed and Distributed Fault-Tolerant Systems

Author: Faghihekhorasani Fathiyeh
Publication venue: 'University of Waterloo'
Publication date: 01/01/2015
Field of study

This dissertation concentrates on the problem of automated synthesis and repair of fault-tolerant systems. In particular, given the required specification of the system, our goal is to synthesize a fault-tolerant system, or repair an existing one. We study this problem for two classes of timed and distributed systems. In the context of timed systems, we focus on efficient synthesis of fault-tolerant timed models from their fault-intolerant version. Although the complexity of the synthesis problem is known to be polynomial time in the size of the time-abstract bisimulation of the input model, the state of the art lacked synthesis algorithms that can be efficiently implemented. This is in part due to the fact that synthesis is in general a challenging problem and its complexity is significantly magnified in the context of timed systems. We propose an algorithm that takes a timed automaton, a set of fault actions, and a set of safety and bounded-time response properties as input, and utilizes a space-efficient symbolic representation of the timed automaton (called the zone graph) to synthesize a fault-tolerant timed automaton as output. The output automaton satisfies strict phased recovery, where it is guaranteed that the output model behaves similarly to the input model in the absence of faults and in the presence of faults, fault recovery is achieved in two phases, each satisfying certain safety and timing constraints. In the context of distributed systems, we study the problem of synthesizing fault-tolerant systems from their intolerant versions, when the number of processes is unknown. To synthesize a distributed fault-tolerant protocol that works for systems with any number of processes, we use counter abstraction. Using this abstraction, we deal with a finite-state abstract model to do the synthesis. Applying our proposed algorithm, we successfully synthesized a fault-tolerant distributed agreement protocol in the presence of Byzantine fault. Although the synthesis problem is known to be NP-complete in the state space of the input protocol (due to partial observability of processes) in the non-parameterized setting, our parameterized algorithm manages to synthesize a solution for a complex problem such as Byzantine agreement within less than two minutes. A system may reach a bad state due to wrong initialization or fault occurrence. One of the well-known types of distributed fault-tolerant systems are self-stabilizing systems. These are the systems that converge to their legitimate states starting from any state, and if no fault occurs, stay in legitimate states thereafter. We propose an automated sound and complete method to synthesize self-stabilizing systems starting from the desired topology and type of the system. Our proposed method is based on SMT-solving, where the desired specification of the system is formulated as SMT constraints. We used the Alloy solver to implement our method, and successfully synthesized some of the well-known self-stabilizing algorithms. We extend our method to support a type of stabilizing algorithm called ideal-stabilization, and also the case when the set of legitimate states is not explicitly known. Quantitative metrics such as recovery time are crucial in self-stabilizing systems when used in practice (such as in networking applications). One of these metrics is the average recovery time. Our automated method for synthesizing self-stabilizing systems generate some solution that respects the desired system specification, but it does not take into account any quantitative metrics. We study the problem of repairing self-stabilizing systems (where only removal of transitions is allowed) to satisfy quantitative limitations. The metric under study is average recovery time, which characterizes the performance of stabilizing programs. We show that the repair problem is NP-complete in the state space of the given system

University of Waterloo's Institutional Repository