156 research outputs found
Verifying Safety Properties With the TLA+ Proof System
TLAPS, the TLA+ proof system, is a platform for the development and
mechanical verification of TLA+ proofs written in a declarative style requiring
little background beyond elementary mathematics. The language supports
hierarchical and non-linear proof construction and verification, and it is
independent of any verification tool or strategy. A Proof Manager uses backend
verifiers such as theorem provers, proof assistants, SMT solvers, and decision
procedures to check TLA+ proofs. This paper documents the first public release
of TLAPS, distributed with a BSD-like license. It handles almost all the
non-temporal part of TLA+ as well as the temporal reasoning needed to prove
standard safety properties, in particular invariance and step simulation, but
not liveness properties
The Impact of RDMA on Agreement
Remote Direct Memory Access (RDMA) is becoming widely available in data
centers. This technology allows a process to directly read and write the memory
of a remote host, with a mechanism to control access permissions. In this
paper, we study the fundamental power of these capabilities. We consider the
well-known problem of achieving consensus despite failures, and find that RDMA
can improve the inherent trade-off in distributed computing between failure
resilience and performance. Specifically, we show that RDMA allows algorithms
that simultaneously achieve high resilience and high performance, while
traditional algorithms had to choose one or another. With Byzantine failures,
we give an algorithm that only requires processes (where
is the maximum number of faulty processes) and decides in two (network)
delays in common executions. With crash failures, we give an algorithm that
only requires processes and also decides in two delays. Both
algorithms tolerate a minority of memory failures inherent to RDMA, and they
provide safety in asynchronous systems and liveness with standard additional
assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith
Design and Analysis of a Logless Dynamic Reconfiguration Protocol
Distributed replication systems based on the replicated state machine model
have become ubiquitous as the foundation of modern database systems. To ensure
availability in the presence of faults, these systems must be able to
dynamically replace failed nodes with healthy ones via dynamic reconfiguration.
MongoDB is a document oriented database with a distributed replication
mechanism derived from the Raft protocol. In this paper, we present
MongoRaftReconfig, a novel dynamic reconfiguration protocol for the MongoDB
replication system. MongoRaftReconfig utilizes a logless approach to managing
configuration state and decouples the processing of configuration changes from
the main database operation log. The protocol's design was influenced by
engineering constraints faced when attempting to redesign an unsafe, legacy
reconfiguration mechanism that existed previously in MongoDB. We provide a
safety proof of MongoRaftReconfig, along with a formal specification in TLA+.
To our knowledge, this is the first published safety proof and formal
specification of a reconfiguration protocol for a Raft-based system. We also
present results from model checking its safety properties on finite protocol
instances. Finally, we discuss the conceptual novelties of MongoRaftReconfig,
how it can be understood as an optimized and generalized version of the single
server reconfiguration algorithm of Raft, and present an experimental
evaluation of how its optimizations can provide performance benefits for
reconfigurations.Comment: 35 pages, 2 figure
Programming Language Abstractions for Modularly Verified Distributed Systems
Distributed systems are rarely developed as monolithic programs. Instead, like any software, these systems may consist of multiple program components, which are then compiled separately and linked together. Modern systems also incorporate various services interacting with each other and with client applications. However, state-of-the-art verification tools focus predominantly on verifying standalone, closed-world protocols or systems, thus failing to account for the compositional nature of distributed systems. For example, standalone verification has the drawback that when protocols and their optimized implementations evolve, one must re-verify the entire system from scratch, instead of leveraging compositionality to contain the reverification effort.
In this paper, we focus on the challenge of modular verification of distributed systems with respect to high-level protocol invariants as well as for low-level implementation safety properties. We argue that the missing link between the two is a programming paradigm that would allow one to reason about both high-level distributed protocols and low-level implementation primitives in a single verification-friendly framework. Such a link would make it possible to reap the benefits from both the vast body of research in distributed computing, focused on modular protocol decomposition and consistency properties, as well as from the recent advances in program verification, enabling construction of provably correct systems implementations. To showcase the modular verification challenges, we present some typical scenarios of decomposition between a distributed protocol and its implementations. We then describe our ongoing research agenda, in which we are attempting to address the outlined problems by providing a typing discipline and a set of domain-specific primitives for specifying, implementing and verifying distributed systems. Our approach, mechanized within a proof assistant, provides the means of decomposition necessary for modular proofs about distributed protocols and systems
- …