Search CORE

5 research outputs found

LIPIcs

Author: Dragoi Cezara
Henzinger Thomas A
Zufferey Damien
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2015
Field of study

Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. Nonetheless there is surprisingly little language and verification support to build distributed systems based on fault-tolerant algorithms. In this paper, we present some of the challenges that a designer has to overcome to implement a fault-tolerant distributed system. Then we review different models that have been proposed to reason about distributed algorithms and sketch how such a model can form the basis for a domain-specific programming language. Adopting a high-level programming model can simplify the programmer's life and make the code amenable to automated verification, while still compiling to efficiently executable code. We conclude by summarizing the current status of an ongoing language design and implementation project that is based on this idea

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

IST Austria: PubRep (Institute of Science and Technology)

Hal-Diderot

An Automata-Theoretic Approach to the Verification of Distributed Algorithms

Author: Aiswarya C.
Bollig Benedikt
Gastin Paul
Publication venue
Publication date: 01/01/2015
Field of study

We introduce an automata-theoretic method for the verification of distributed algorithms running on ring networks. In a distributed algorithm, an arbitrary number of processes cooperate to achieve a common goal (e.g., elect a leader). Processes have unique identifiers (pids) from an infinite, totally ordered domain. An algorithm proceeds in synchronous rounds, each round allowing a process to perform a bounded sequence of actions such as send or receive a pid, store it in some register, and compare register contents wrt. the associated total order. An algorithm is supposed to be correct independently of the number of processes. To specify correctness properties, we introduce a logic that can reason about processes and pids. Referring to leader election, it may say that, at the end of an execution, each process stores the maximum pid in some dedicated register. Since the verification of distributed algorithms is undecidable, we propose an underapproximation technique, which bounds the number of rounds. This is an appealing approach, as the number of rounds needed by a distributed algorithm to conclude is often exponentially smaller than the number of processes. We provide an automata-theoretic solution, reducing model checking to emptiness for alternating two-way automata on words. Overall, we show that round-bounded verification of distributed algorithms over rings is PSPACE-complete.Comment: 26 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Testing consensus implementations using communication closure

Author: Dragoi Cezara
Enea Constantin
Majumdar Rupak
Niksic Filip
Ozkan Burcu Kulahcioglu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/10/2021
Field of study

International audienceLarge scale production distributed systems are difficult to design and test. Correctness must be ensured when processes run asynchronously, at arbitrary rates relative to each other, and in the presence of failures, e.g., process crashes or message losses. These conditions create a huge space of executions that is difficult to explore in a principled way. Current testing techniques focus on systematic or randomized exploration of all executions of an implementation while treating the implemented algorithms as black boxes. On the other hand, proofs of correctness of many of the underlying algorithms often exploit semantic properties that reduce reasoning about correctness to a subset of behaviors. For example, the communication-closure property, used in many proofs of distributed consensus algorithms, shows that every asynchronous execution of the algorithm is equivalent to a lossy synchronous execution, thus reducing the burden of proof to only that subset. In a lossy synchronous execution, processes execute in lock-step rounds, and messages are either received in the same round or lost forever-such executions form a small subset of all asynchronous ones. We formulate the communication-closure hypothesis, which states that bugs in implementations of distributed consensus algorithms will already manifest in lossy synchronous executions and present a testing algorithm based on this hypothesis. We prioritize the search space based on a bound on the number of failures in the execution and the rate at which these failures are recovered. We show that a random testing algorithm based on sampling lossy synchronous executions can empirically find a number of bugs-including previously unknown ones-in production distributed systems such as Zookeeper, Cassandra, and Ratis, and also produce more understandable bug traces

INRIA a CCSD electronic archive server

HAL Descartes

Asphalion:trustworthy shielding against Byzantine faults

Author: Rahli Vincent
Veríssimo Paulo Jorge Esteves
Vukotic Ivana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/10/2019
Field of study

University of Birmingham Research Portal

A Reduction Theorem for the Verification of Round-Based Distributed Algorithms

Author: A. Valmari
D. Peled
L. Lamport
L. Lamport
L. Lamport
N.A. Lynch
P. Godefroid
T. Tsuchiya
T. Tsuchiya
Y. Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceWe consider the verification of algorithms expressed in the Heard-Of Model, a round-based computational model for fault-tolerant distributed computing. Rounds in this model are communication-closed, and we show that every execution recording individual events corresponds to a coarser-grained execution based on global rounds such that the local views of all processes are identical in the two executions. This result helps us to substantially mitigate state-space explosion and verify Consensus algorithms using standard model checking techniques

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique