48 research outputs found
Automated Detection of Serializability Violations Under Weak Consistency
While a number of weak consistency mechanisms have been developed in recent years to improve performance and ensure availability in distributed, replicated systems, ensuring the correctness of transactional applications running on top of such systems remains a difficult and important problem. Serializability is a well-understood correctness criterion for transactional programs; understanding whether applications are serializable when executed in a weakly-consistent environment, however remains a challenging exercise. In this work, we combine a dependency graph-based characterization of serializability and leverage the framework of abstract executions to develop a fully-automated approach for statically finding bounded serializability violations under any weak consistency model. We reduce the problem of serializability to satisfiability of a formula in First-Order Logic (FOL), which allows us to harness the power of existing SMT solvers. We provide rules to automatically construct the FOL encoding from programs written in SQL (allowing loops and conditionals) and express consistency specifications as FOL formula. In addition to detecting bounded serializability violations, we also provide two orthogonal schemes to reason about unbounded executions by providing sufficient conditions (again, in the form of FOL formulae) whose satisfiability implies the absence of anomalies in any arbitrary execution. We have applied the proposed technique on TPC-C, a real-world database program with complex application logic, and were able to discover anomalies under Parallel Snapshot Isolation (PSI), and verify serializability for unbounded executions under Snapshot Isolation (SI), two consistency mechanisms substantially weaker than serializability
Distributed Multi-writer Multi-reader Atomic Register with Optimistically Fast Read and Write
A distributed multi-writer multi-reader (MWMR) atomic register is an
important primitive that enables a wide range of distributed algorithms. Hence,
improving its performance can have large-scale consequences. Since the seminal
work of ABD emulation in the message-passing networks [JACM '95], many
researchers study fast implementations of atomic registers under various
conditions. "Fast" means that a read or a write can be completed with 1
round-trip time (RTT), by contacting a simple majority. In this work, we
explore an atomic register with optimal resilience and "optimistically fast"
read and write operations. That is, both operations can be fast if there is no
concurrent write.
This paper has three contributions: (i) We present Gus, the emulation of an
MWMR atomic register with optimal resilience and optimistically fast reads and
writes when there are up to 5 nodes; (ii) We show that when there are > 5
nodes, it is impossible to emulate an MWMR atomic register with both
properties; and (iii) We implement Gus in the framework of EPaxos and Gryff,
and show that Gus provides lower tail latency than state-of-the-art systems
such as EPaxos, Gryff, Giza, and Tempo under various workloads in the context
of geo-replicated object storage systems
A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules
System-level emulators have been used extensively for system design,
debugging and evaluation. They work by providing a system-level virtual machine
to support a guest operating system (OS) running on a platform with the same or
different native OS that uses the same or different instruction-set
architecture. For such system-level emulation, dynamic binary translation (DBT)
is one of the core technologies. A recently proposed learning-based DBT
approach has shown a significantly improved performance with a higher quality
of translated code using automatically learned translation rules. However, it
has only been applied to user-level emulation, and not yet to system-level
emulation. In this paper, we explore the feasibility of applying this approach
to improve system-level emulation, and use QEMU to build a prototype. ... To
achieve better performance, we leverage several optimizations that include
coordination overhead reduction to reduce the overhead of each coordination,
and coordination elimination and code scheduling to reduce the coordination
frequency. Experimental results show that it can achieve an average of 1.36X
speedup over QEMU 6.1 with negligible coordination overhead in the system
emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on
real-world applications.Comment: 10 pages, 19 figures, to be published in International Symposium on
Code Generation and Optimization (CGO) 202
Non-Uniform Replication
Replication is a key technique in the design of efficient and reliable distributed systems. As information grows, it becomes difficult or even impossible to store all information at every replica. A common approach to deal with this problem is to rely on partial replication, where each replica maintains only a part of the total system information. As a consequence, a remote replica might need to be contacted for computing the reply to some given query, which leads to high latency costs particularly in geo-replicated settings. In this work, we introduce the concept of non- uniform replication, where each replica stores only part of the information, but where all replicas store enough information to answer every query. We apply this concept to eventual consistency and conflict-free replicated data types. We show that this model can address useful problems and present two data types that solve such problems. Our evaluation shows that non-uniform replication is more efficient than traditional replication, using less storage space and network bandwidth
Aurora: Leaderless State-Machine Replication with High Throughput
State-machine replication (SMR) allows a state machine to be replicated across a set of replicas and handle clients\u27 requests as a single machine. Most existing SMR protocols are leader-based, i.e., requiring a leader to order requests and coordinate the protocol. This design places a disproportionately high load on the leader, inevitably impairing the scalability. If the leader fails, a complex and bug-prone fail-over protocol is needed to switch to a new leader. An adversary can also exploit the fail-over protocol to slow down the protocol.
In this paper, we propose a crash-fault tolerant SMR named Aurora, with the following properties:
• Leaderless: it does not require a leader, hence completely get rid of the fail-over protocol.
• Scalable: it can scale to a large number of replicas.
• Robust: it behaves well even under a poor network connection.
We provide a full-fledged implementation of Aurora and systematically evaluate its performance. Our benchmark results show that Aurora achieves a peak throughput of around two million TPS, up to 8.7 higher than the state-of-the-art leaderless SMR
Efficient Multi-Word Compare and Swap
Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS.
We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS