Search CORE

1,390 research outputs found

PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

Author: Didona Diego
Spirovska Kristina
Zwaenepoel Willy
Publication venue
Publication date: 25/02/2019
Field of study

Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design choice for building geo-replicated platforms, as it increases the storage capacity and reduces update propagation costs. This paper presents PaRiS, the first TCC system that supports partial replication and implements non-blocking parallel read operations, whose latency is paramount for the performance of read-intensive applications. PaRiS relies on a novel protocol to track dependencies, called Universal Stable Time (UST). By means of a lightweight background gossip process, UST identifies a snapshot of the data that has been installed by every DC in the system. Hence, transactions can consistently read from such a snapshot on any server in any replication site without having to block. Moreover, PaRiS requires only one timestamp to track dependencies and define transactional snapshots, thereby achieving resource efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment composed of up to 10 replication sites. We show that PaRiS scales well with the number of DCs and partitions, while being able to handle larger data-sets than existing solutions that assume full replication. We also demonstrate a performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x higher throughput with 5.91x lower latency for read-dominated workloads and up to 1.46x higher throughput with 20.56x lower latency for write-heavy workloads)

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Detection of global state predicates

Author: Marzullo Keith
Neiger Gil
Publication venue
Publication date
Field of study

The problem addressed here arises in the context of Meta: how can a set of processes monitor the state of a distributed application in a consistent manner? For example, consider the simple distributed application as shown here. Each of the three processes in the application has a light, and the control processes would each like to take an action when some specified subset of the lights are on. The application processes are instrumented with stubs that determine when the process turns its lights on or off. This information is disseminated to the control processes, each of which then determines when its condition of interest is met. Meta is built on top of the ISIS toolkit, and so we first built the sensor dissemination mechanism using atomic broadcast. Atomic broadcast guarantees that all recipients receive the messages in the same order and that this order is consistent with causality. Unfortunately, the control processes are somewhat limited in what they can deduce when they find that their condition of interest holds

NASA Technical Reports Server

A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

Author: Treaster Michael
Publication venue
Publication date: 31/12/2004
Field of study

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Global state predicates in rough real-time

Author: Mayo Jean Ann
Publication venue: W&M ScholarWorks
Publication date: 01/01/1997
Field of study

Distributed systems are characterized by the fact that the constituent processes have neither common memory nor a common system clock. These processes communicate solely via message passing. While providing a number of benefits such as increased reliability, increased computational power, and geographic dispersion, this architecture significantly complicates many of the tasks of software development and verification, including evaluation of the program state. In the case of distributed systems, the program state is comprised of the local states of the constituent processes, as well as the state of the channels between processes, and is called the global state.;With no common system clock, many distributed system protocols rely on the global ordering of local process events imposed by the message passing that occurs between processes. This leads to a partial global ordering of local process events, which can then be used to determine which process states could (or could not) have occurred simultaneously.;Traditional predicate evaluation protocols evaluate predicates on the global state of a distributed computation using consistent global states. This evaluation is complicated by the fact that the event ordering imposed by message passing is only partial. A complete history of the global states that occurred during an execution cannot always be constructed. This introduces inefficiency into predicate detection protocols and prohibits detection of certain predicates.;This dissertation explores the use of this rough global time base for global state predicate evaluation within distributed systems. By structuring the evaluation on the assumption that a global time base exists, we can develop simple and efficient protocols for both stable and unstable predicate evaluation. Further, we can evaluate certain predicates which are not easily evaluated using consistent global states. We demonstrate these advantages by developing protocols for detection of distributed termination, distributed deadlock detection, and detection of certain unstable predicates as they occur. as the global time base is rough, we can only detect unstable predicates which remain true for a sufficient duration. We additionally develop several formalizations which assist the protocol developer in dealing with the fact that the global time base is not perfect. We demonstrate the application of these formalizations within the protocols that we develop

College of William & Mary: W&M Publish