156 research outputs found

    Verifying Safety Properties With the TLA+ Proof System

    Get PDF
    TLAPS, the TLA+ proof system, is a platform for the development and mechanical verification of TLA+ proofs written in a declarative style requiring little background beyond elementary mathematics. The language supports hierarchical and non-linear proof construction and verification, and it is independent of any verification tool or strategy. A Proof Manager uses backend verifiers such as theorem provers, proof assistants, SMT solvers, and decision procedures to check TLA+ proofs. This paper documents the first public release of TLAPS, distributed with a BSD-like license. It handles almost all the non-temporal part of TLA+ as well as the temporal reasoning needed to prove standard safety properties, in particular invariance and step simulation, but not liveness properties

    The Impact of RDMA on Agreement

    Full text link
    Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n≥2fP+1n \geq 2f_P + 1 processes (where fPf_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n≥fP+1n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith

    Design and Analysis of a Logless Dynamic Reconfiguration Protocol

    Get PDF
    Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace failed nodes with healthy ones via dynamic reconfiguration. MongoDB is a document oriented database with a distributed replication mechanism derived from the Raft protocol. In this paper, we present MongoRaftReconfig, a novel dynamic reconfiguration protocol for the MongoDB replication system. MongoRaftReconfig utilizes a logless approach to managing configuration state and decouples the processing of configuration changes from the main database operation log. The protocol's design was influenced by engineering constraints faced when attempting to redesign an unsafe, legacy reconfiguration mechanism that existed previously in MongoDB. We provide a safety proof of MongoRaftReconfig, along with a formal specification in TLA+. To our knowledge, this is the first published safety proof and formal specification of a reconfiguration protocol for a Raft-based system. We also present results from model checking its safety properties on finite protocol instances. Finally, we discuss the conceptual novelties of MongoRaftReconfig, how it can be understood as an optimized and generalized version of the single server reconfiguration algorithm of Raft, and present an experimental evaluation of how its optimizations can provide performance benefits for reconfigurations.Comment: 35 pages, 2 figure

    Programming Language Abstractions for Modularly Verified Distributed Systems

    Get PDF
    Distributed systems are rarely developed as monolithic programs. Instead, like any software, these systems may consist of multiple program components, which are then compiled separately and linked together. Modern systems also incorporate various services interacting with each other and with client applications. However, state-of-the-art verification tools focus predominantly on verifying standalone, closed-world protocols or systems, thus failing to account for the compositional nature of distributed systems. For example, standalone verification has the drawback that when protocols and their optimized implementations evolve, one must re-verify the entire system from scratch, instead of leveraging compositionality to contain the reverification effort. In this paper, we focus on the challenge of modular verification of distributed systems with respect to high-level protocol invariants as well as for low-level implementation safety properties. We argue that the missing link between the two is a programming paradigm that would allow one to reason about both high-level distributed protocols and low-level implementation primitives in a single verification-friendly framework. Such a link would make it possible to reap the benefits from both the vast body of research in distributed computing, focused on modular protocol decomposition and consistency properties, as well as from the recent advances in program verification, enabling construction of provably correct systems implementations. To showcase the modular verification challenges, we present some typical scenarios of decomposition between a distributed protocol and its implementations. We then describe our ongoing research agenda, in which we are attempting to address the outlined problems by providing a typing discipline and a set of domain-specific primitives for specifying, implementing and verifying distributed systems. Our approach, mechanized within a proof assistant, provides the means of decomposition necessary for modular proofs about distributed protocols and systems
    • …
    corecore