Search CORE

5,273 research outputs found

A Short Counterexample Property for Safety and Liveness Verification of Fault-tolerant Distributed Algorithms

Author: Abdulla P. A.
Baier C.
Bouajjani A.
Charron-Bost B.
Clarke E.
Clarke E.
Cohen E.
De Moura L.
Emerson E. A.
Esparza J.
Fisman D.
Konnov I.
Konnov I.
Kroening D.
Ongaro D.
Pnueli A.
Pnueli A.
Rahli V.
Vardi M. Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/11/2016
Field of study

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards. We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.Comment: 16 pages, 11 pages appendi

arXiv.org e-Print Archive

Crossref

Rapid Recovery for Systems with Scarce Faults

Author: Huang Chung-Hao
Peled Doron
Schewe Sven
Wang Farn
Publication venue: 'Open Publishing Association'
Publication date: 01/10/2012
Field of study

Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

arXiv.org e-Print Archive

Directory of Open Access Journals

Communication Efficient Checking of Big Data Operations

Author: Hübschle-Schneider Lorenz
Sanders Peter
Publication venue
Publication date: 01/01/2018
Field of study

We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis

arXiv.org e-Print Archive

Crossref

KITopen

Verification and Synthesis of Symmetric Uni-Rings for Leads-To Properties

Author: abdulla
conchon
ebnenasir
ebnenasir
finkbeiner
gascón
ghilardi
grinchtein
lazic
matthews
mcmillan
mirzaie
varghese
wolper
Publication venue
Publication date: 20/05/2019
Field of study

This paper investigates the verification and synthesis of parameterized protocols that satisfy leadsto properties

R \leadsto Q

on symmetric unidirectional rings (a.k.a. uni-rings) of deterministic and constant-space processes under no fairness and interleaving semantics, where

R

and

Q

are global state predicates. First, we show that verifying

R \leadsto Q

for parameterized protocols on symmetric uni-rings is undecidable, even for deterministic and constant-space processes, and conjunctive state predicates. Then, we show that surprisingly synthesizing symmetric uni-ring protocols that satisfy

R \leadsto Q

is actually decidable. We identify necessary and sufficient conditions for the decidability of synthesis based on which we devise a sound and complete polynomial-time algorithm that takes the predicates

R

and

Q

, and automatically generates a parameterized protocol that satisfies

R \leadsto Q

for unbounded (but finite) ring sizes. Moreover, we present some decidability results for cases where leadsto is required from multiple distinct

R

predicates to different

Q

predicates. To demonstrate the practicality of our synthesis method, we synthesize some parameterized protocols, including agreement and parity protocols

arXiv.org e-Print Archive

Michigan Technological University

Crossref

An Outline of a Proposed System that Learns from Experts How to Discharge Proof Obligations Automatically

Author: Bundy Alan
Grov Gudmund
Jones Cliff B.
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

Rigorous Design of Fault-Tolerant Transactions for Replicated Database Systems using Event B

Author: Butler Michael
Yadav Divakar
Publication venue: Lecture Notes in Computer Science, Springer , 2006
Publication date: 01/01/2006
Field of study

System availability is improved by the replication of data objects in a distributed database system. However, during updates, the complexity of keeping replicas identical arises due to failures of sites and race conditions among conflicting transactions. Fault tolerance and reliability are key issues to be addressed in the design and architecture of these systems. Event B is a formal technique which provides a framework for developing mathematical models of distributed systems by rigorous description of the problem, gradually introducing solutions in refinement steps, and verification of solutions by discharge of proof obligations. In this paper, we present a formal development of a distributed system using Event B that ensures atomic commitment of distributed transactions consisting of communicating transaction components at participating sites. This formal approach carries the development of the system from an initial abstract specification of transactional updates on a one copy database to a detailed design containing replicated databases in refinement. Through refinement we verify that the design of the replicated database confirms to the one copy database abstraction

Southampton (e-Prints Soton)