1 research outputs found
Starting a Dialog between Model Checking and Fault-tolerant Distributed Algorithms
Fault-tolerant distributed algorithms are central for building reliable
spatially distributed systems. Unfortunately, the lack of a canonical precise
framework for fault-tolerant algorithms is an obstacle for both verification
and deployment. In this paper, we introduce a new domain-specific framework to
capture the behavior of fault-tolerant distributed algorithms in an adequate
and precise way. At the center of our framework is a parameterized system model
where control flow automata are used for process specification. To account for
the specific features and properties of fault-tolerant distributed algorithms
for message-passing systems, our control flow automata are extended to model
threshold guards as well as the inherent non-determinism stemming from
asynchronous communication, interleavings of steps, and faulty processes.
We demonstrate the adequacy of our framework in a representative case study
where we formalize a family of well-known fault-tolerant broadcasting
algorithms under a variety of failure assumptions. Our case study is supported
by model checking experiments with safety and liveness specifications for a
fixed number of processes. In the experiments, we systematically varied the
assumptions on both the resilience condition and the failure model. In all
cases, our experiments coincided with the theoretical results predicted in the
distributed algorithms literature. This is giving clear evidence for the
adequacy of our model.
In a companion paper, we are addressing the new model checking techniques
necessary for parametric verification of the distributed algorithms captured in
our framework