2 research outputs found
Automated Analysis and Optimization of Distributed Self-Stabilizing Algorithms
Self-stabilization [2] is a versatile technique for recovery from erroneous behavior due to transient
faults or wrong initialization. A system is self-stabilizing if (1) starting from an arbitrary
initial state it can automatically reach a set of legitimate states in a finite number of steps and (2)
it remains in legitimate states in the absence of faults. Weak-stabilization [3] and probabilistic-stabilization
[4] were later introduced in the literature to deal with resource consumption of
self-stabilizing algorithms and impossibility results. Since the system perturbed by fault may
deviate from correct behavior for a finite amount of time, it is paramount to minimize this time
as much as possible, especially in the domain of robotics and networking. This type of fault
tolerance is called non-masking because the faulty behavior is not completely masked from the
user [1].
Designing correct stabilizing algorithms can be tedious. Designing such algorithms that
satisfy certain average recovery time constraints (e.g., for performance guarantees) adds further
complications to this process. Therefore, developing an automatic technique that takes as input
the specification of the desired system, and synthesizes as output a stabilizing algorithm with
minimum (or other upper bound) average recovery time is useful and challenging. In this thesis,
our main focus is on designing automated techniques to optimize the average recovery time of
stabilizing systems using model checking and synthesis techniques.
First, we prove that synthesizing weak-stabilizing distributed programs from scratch and repairing
stabilizing algorithms with average recovery time constraints are NP-complete in the
state-space of the program. To cope with this complexity, we propose a polynomial-time heuristic
that compared to existing stabilizing algorithms, provides lower average recovery time for
many of our case studies.
Second, we study the problem of fine tuning of probabilistic-stabilizing systems to improve
their performance. We take advantage of the two properties of self-stabilizing algorithms to
model them as absorbing discrete-time Markov chains. This will reduce the computation of
average recovery time to finding the weighted sum of elements in the inverse of a matrix.
Finally, we study the impact of scheduling policies on recovery time of stabilizing systems.
We, in particular, propose a method to augment self-stabilizing programs with k-central and k-bounded
schedulers to study dierent factors, such as geographical distance of processes and the
achievable level of parallelism
Automated Synthesis of Timed and Distributed Fault-Tolerant Systems
This dissertation concentrates on the problem of automated synthesis and repair of fault-tolerant systems. In particular, given the required specification of the system, our goal is to synthesize a fault-tolerant system, or repair an existing one. We study this problem for two classes of timed and distributed systems.
In the context of timed systems, we focus on efficient synthesis of fault-tolerant timed models from their fault-intolerant version. Although the complexity of the synthesis problem is known to be polynomial time in the size of the time-abstract bisimulation of the input model, the state of the art lacked synthesis
algorithms that can be efficiently implemented. This is in part due to the fact that synthesis is in general a
challenging problem and its complexity is significantly magnified in the context of timed systems. We
propose an algorithm that takes a timed automaton, a set of fault actions, and a set of safety and bounded-time response properties as input, and utilizes a space-efficient symbolic representation of the timed
automaton (called the zone graph) to synthesize a fault-tolerant timed automaton as output. The output
automaton satisfies strict phased recovery, where it is guaranteed that the output model behaves similarly
to the input model in the absence of faults and in the presence of faults, fault recovery is achieved in two
phases, each satisfying certain safety and timing constraints.
In the context of distributed systems, we study the problem of synthesizing fault-tolerant systems from their
intolerant versions, when the number of processes is unknown. To synthesize a distributed fault-tolerant
protocol that works for systems with any number of processes, we use counter abstraction. Using this
abstraction, we deal with a finite-state abstract model to do the synthesis. Applying our proposed algorithm,
we successfully synthesized a fault-tolerant distributed agreement protocol in the presence of Byzantine fault. Although the synthesis problem is known to be NP-complete in the state space of the input
protocol (due to partial observability of processes) in the non-parameterized setting, our parameterized
algorithm manages to synthesize a solution for a complex problem such as Byzantine agreement within less than two minutes.
A system may reach a bad state due to wrong initialization or fault occurrence. One of the well-known
types of distributed fault-tolerant systems are self-stabilizing systems. These are the systems that converge
to their legitimate states starting from any state, and if no fault occurs, stay in legitimate states thereafter.
We propose an automated sound and complete method to synthesize self-stabilizing systems starting from
the desired topology and type of the system. Our proposed method is based on SMT-solving, where the
desired specification of the system is formulated as SMT constraints. We used the Alloy solver to
implement our method, and successfully synthesized some of the well-known self-stabilizing algorithms.
We extend our method to support a type of stabilizing algorithm called ideal-stabilization, and also the case
when the set of legitimate states is not explicitly known.
Quantitative metrics such as recovery time are crucial in self-stabilizing systems when used in practice
(such as in networking applications). One of these metrics is the average recovery time. Our automated
method for synthesizing self-stabilizing systems generate some solution that respects the desired system
specification, but it does not take into account any quantitative metrics. We study the problem of repairing
self-stabilizing systems (where only removal of transitions is allowed) to satisfy quantitative limitations.
The metric under study is average recovery time, which characterizes the performance of stabilizing
programs. We show that the repair problem is NP-complete in the state space of the given system