9 research outputs found

    Executable Denotational Semantics With Interaction Trees

    Get PDF
    Interaction trees are a representation of effectful and reactive systemsdesigned to be implemented in a proof assistant such as Coq. They are equipped with a rich algebra of combinators to construct recursive and effectful computations and to reason about them equationally. Interaction trees are also an executable structure, notably via extraction, which enables testing and directly developing executable programs in Coq. To demonstrate the usefulness of interaction trees, two applications are presented. First, I develop a novel approach to verify a compiler from a simple imperative language to assembly, by proving a semantic preservation theorem which is termination-sensitive, using an equational proof. Second, I present a framework of concurrent objects, inheriting the modularity, compositionality, and executability of interaction trees. Leveraging that framework, I formally prove the correctness of a transactionally predicated map, using a novel approach to reason about objects combining the notions of linearizability and strict serializability, two well-known correctness conditions for concurrent objects

    C4: Verified Transactional Objects

    Get PDF
    A framework for Verified Transactional Objects in Coq. - Formalization of concurrent objects, linearizability, strict serializability, and associated proof techniques. - Verified linearizable concurrent hash map - Verified strictly serializable TML - Verified strictly serializable transaction-predicated ma

    The Fence Complexity of Persistent Sets

    Get PDF
    This thesis studies fence complexity of concurrent sets in a non-volatile shared memory model. I consider the case where CPU registers and cache memory remain volatile while main memory is non-volatile. Flush instructions are required to force shared state to be written back to non-volatile memory. These flush instructions must be accompanied by the use of expensive fence instructions to enforce ordering among such flushes. Collectively I refer to a flush and a fence as a psync. In this model the system can crash at any time. When the system crashes the contents of volatile memory are lost. I consider lock-free implementations of the set abstract data type and the safety properties of strict linearizability and durable linearizability. Strict linearizability forces crashed operations to take effect before the crash or not take effect at all; the weaker property of durable linearizability enforces this requirement only for operations that have completed prior to the crash event. In this thesis, I consider classes of strict linearizable implementations that guarantee operations take effect at or before the point when the operation is persisted. I prove two lower bounds for lock-free implementations of the set abstract data type. First, I prove that it is impossible to implement strict linearizable lock-free sets in which read-only (or search) operations do not flush or fence. Second, I prove that for any durable-linearizable lock-free set there must exist an execution in which some process must perform at least one redundant psync as part of an update operation. I also present several implementations of persistent concurrent lock-free sets. I evaluate these implementations against existing persistent sets. This evaluation exposes the impact of algorithmic design and safety properties on psync complexity in practice as well as the cost of recovering the data structure following a system crash

    Abortable Linearizable Modules

    Get PDF
    We define the Abortable Linearizable Module automaton (ALM for short) and prove its key composition property using the IOA theory of HOLCF. The ALM is at the heart of the Speculative Linearizability framework. This framework simplifies devising correct speculative algorithms by enabling their decomposition into independent modules that can be analyzed and proved correct in isolation. It is particularly useful when working in a distributed environment, where the need to tolerate faults and asynchrony has made current monolithic protocols so intricate that it is no longer tractable to check their correctness. Our theory contains a typical example of a refinement proof in the I/O-automata framework of Lynch and Tuttle

    On the cost of composing shared-memory algorithms

    Get PDF
    Decades of research in distributed computing have led to a variety of perspectives on what it means for a concurrent algorithm to be efficient, depending on model assumptions, progress guarantees, and complexity metrics. It is therefore natural to ask whether one could compose algorithms that perform efficiently under different conditions, so that the composition preserves the performance of the original components when their conditions are met. In this paper, we evaluate the cost of composing shared-memory algorithms. First, we formally define the notion of safely composable algorithms and we show that every sequential type has a safely composable implementation, as long as enough state is transferred between modules. Since such generic implementations are inherently expensive, we present a more general light-weight specification that allows the designer to transfer very little state between modules, by taking advantage of the semantics of the implemented object. Using this framework, we implement a composed long-lived test-and-set object, with the property that each of its modules is asymptotically optimal with respect to the progress condition it ensures, while the entire implementation only uses objects with consensus number at most two. Thus, we show that the overhead of composition can be negligible in the case of some important shared-memory abstractions

    On the cost of composing shared-memory algorithms

    Full text link

    Abstractions for asynchronous distributed computing with malicious players

    Get PDF
    In modern distributed systems, failures are the norm rather than the exception. In many cases, these failures are not benign. Settings such as the Internet might incur malicious (also called Byzantine or arbitrary) behavior and asynchrony. As a result, and perhaps not surprisingly, research on asynchronous Byzantine fault-tolerant (BFT) distributed systems is flourishing. Tolerating arbitrary behavior and asynchrony calls for very sophisticated algorithms. This is in particular the case with BFT solutions that aim to provide properties such as: (a) optimal resilience, i.e., tolerating as many Byzantine failures as possible and (b) optimal performance with respect to some relevant complexity metric. Most BFT algorithms are built from scratch or by modifying existing solutions in a non-modular manner, which often renders these algorithms difficult to understand and, consequently, impedes their wider adoption. We attribute this complexity to the lack of sufficient number of adequate abstractions for asynchronous BFT distributed computing. The motivation of this thesis is to propose reusable abstractions for devising asynchronous BFT distributed algorithms that are optimally resilient and/or have optimal complexity, with strong focus on one of the most important complexity metrics — time complexity (or latency). The abstractions proposed in this thesis are devised with three fundamental distributed applications in mind: (a) read/write storage (also called register), (b) consensus and (c) state machine replication (SMR). We demonstrate how to use our abstractions in these applications to devise asynchronous BFT algorithms that feature the best complexity among all algorithms we know of, in addition to optimal resilience. First, we introduce the notion of a refined quorum system (RQS) of some set S as a set of three classes of subsets (quorums) of S: first class quorums are also second class quorums, themselves being also third class quorums. First class quorums have large intersections with all other quorums, second class quorums typically have smaller intersections with those of the third class, the latter simply correspond to traditional quorums. The refined quorum system abstraction helps design algorithms that tolerate contention (process concurrency), arbitrarily long periods of asynchrony and the largest possible number of failures, but perform fast if few failures occur, the system is synchronous and there is no contention, i.e., under conditions that are assumed to be frequent in practice. In other words, RQS helps combine optimal resilience and optimal best-case time complexity. Intuitively, under uncontended and synchronous conditions, a distributed object implementation would expedite an operation if a quorum of the first class is accessed, then degrade gracefully depending on whether a quorum of the second or the third class is accessed. Our notion of RQS is devised assuming a general adversary structure, and this basically allows algorithms relying on RQS to relax the assumption of independent process failures. We illustrate the power of refined quorums by introducing two new optimal BFT atomic object implementations: an atomic storage and consensus algorithm. Our second abstraction is a novel timestamping mechanism called high resolution timestamps (HRts), which can be seen as a variation of a matrix clocks. Roughly speaking, a high resolution timestamp contains a matrix of local timestamps of (a subset of) processes as seen by (a subset of) other processes. Complementary to RQS, HRts simplify the design of BFT distributed algorithms that combine optimal resilience and worst-case time complexity. We apply high-resolution timestamps to design read/write storage algorithms in which HRts are used to detect and filter out Byzantine processes, which paves the path to the first BFT storage algorithms that combine optimal resilience with optimal worst-case time complexity. Finally, we introduce ABsTRACT (Abortable Byzantine faulT-toleRant stAte maChine replicaTion), a generic abstraction that simplifies the notoriously difficult task of developing BFT state machine replication algorithms. ABsTRACT resembles BFT-SMR and it can be used to make any shared service Byzantine fault-tolerant, with one exception: it may sometimes abort a client request. The non-triviality condition under which ABsTRACT cannot abort is a generic parameter. We view a BFT-SMR algorithm as a composition of instances of ABsTRACT, each instance developed and analyzed independently. To illustrate our approach, we describe two new optimally resilient BFT algorithms. The first, that makes use of our refined quorums, has the lowest time complexity among all BFT-SMR algorithms we know of, in synchronous periods that are free from contention and failures. The second algorithm has the highest peak throughput in failure-free and synchronous periods; this algorithm argues for general applicability of ABsTRACT in developing BFT shared services that feature optimal complexity, beyond the time complexity metric

    Replication of non-deterministic objects

    Get PDF
    This thesis discusses replication of non-deterministic objects in distributed systems to achieve fault tolerance against crash failures. The objects replicated are the virtual nodes of a distributed application. Replication is viewed as an issue that is to be dealt with only during the configuration of a distributed application and that should not affect the development of the application. Hence, replication of virtual nodes should be transparent to the application. Like all measures to achieve fault tolerance, replication introduces redundancy in the system. Not surprisingly, the main difficulty is guaranteeing the consistency of all replicas such that they behave in the same way as if the object was not replicated (replication transparency). This is further complicated if active objects (like virtual nodes) are replicated, and these objects themselves can be clients of still further objects in the distributed application. The problems of replication of active non-deterministic objects are analyzed in the context of distributed Ada 95 applications. The ISO standard for Ada 95 defines a model for distributed execution based on remote procedure calls (RPC). Virtual nodes in Ada 95 use this as their sole communication paradigm, but they may contain tasks to execute activities concurrently, thus making the execution potentially non-deterministic due to implicit timing dependencies. Such non-determinism cannot be avoided by choosing deterministic tasking policies. I present two different approaches to maintain replica consistency despite this non-determinism. In a first approach, I consider the run-time support of Ada 95 as a black box (except for the part handling remote communications). This corresponds to a non-deterministic computation model. I show that replication of non-deterministic virtual nodes requires that remote procedure calls are implemented as nested transactions. Unfortunately, effects of failures are not local to the replicas of a virtual node: when a failure occurs, nested remote calls made to other virtual nodes must be undone. Also, using transactional semantics for RPCs necessitates a compromise regarding transparency: the application must identify global state for it cannot be determined reliably in an automatic way. Further study reveals that this approach cannot be implemented in a transparent way at all because the consistency criterion of Ada 95 (linearizability) is much weaker than that of transactions (serializability). An execution of remote procedure calls as transactions may thus lead to incompatibilities with the semantics of the programming language. If remotely called subprograms on a replicated virtual node perform partial operations, i.e., entry calls on global protected objects, deadlocks that cannot be broken can occur in certain cases. Such deadlocks do not occur when the virtual node is not replicated. The transactional semantics of RPCs must therefore be exposed to the application. A second approach is based on a piecewise deterministic computation model, i.e., the execution of a virtual node is seen as a sequence of deterministic state intervals. Whenever a non-deterministic event occurs, a new state interval is started. I study replica organization under this computation model (semi-active replication). In this model, all non-deterministic decisions are made on one distinguished replica (the leader), while all other replicas (the followers) are forced to follow the same sequence of non-deterministic events. I show that it suffices to synchronize the followers with the leader upon each observable event, i.e., when the leader sends a message to some other virtual node. It is not necessary to synchronize upon each and every non-deterministic event — which would incur a prohibitively high overhead. Non-deterministic events occurring on the leader between observable events are logged and sent to the followers just before the leader executes an observable event. Consequently, it is guaranteed that the followers will reach the same state as the leader, and thus the effects of failures remain mostly local to the replicas. A prototype implementation called RAPIDS (Replicated Ada Partitions In Distributed Systems) serves as a proof of concept for this second approach, demonstrating its feasibility. RAPIDS is an Ada 95 implementation of a replication manager for semi-active replication for the GNAT development system for Ada 95. It is entirely contained within the run-time support and hence largely transparent for the application

    Abortable linearizable modules

    No full text
    We define the Abortable Linearizable Module automaton (ALM for short) and prove its key composition property using the IOA theory of HOLCF. The ALM is at the heart of the Speculative Linearizability framework. This framework simplifies devising correct speculative algorithms by enabling their decomposition into independent modules that can be analyzed and proved correct in isolation. It is particularly useful when working in a distributed environment, where the need to tolerate faults and asynchrony has made current monolithic protocols so intricate that it is no longer tractable to check their correctness. Our theory contains a typical example of a refinement proof in the I/O-automata framework of Lynch and Tuttle
    corecore