Search CORE

26,873 research outputs found

Incremental Consistency Guarantees for Replicated Objects

Author: Guerraoui Rachid
Pavlovic Matej
Seredinschi Dragos-Adrian
Publication venue
Publication date: 08/09/2016
Field of study

Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

Author: Treaster Michael
Publication venue
Publication date: 31/12/2004
Field of study

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Replica determinism and flexible scheduling in hard real-time dependable systems

Author: Barrett P
Burns A
Poledna S
Wellings A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Fault-tolerant real-time systems are typically based on active replication where replicated entities are required to deliver their outputs in an identical order within a given time interval. Distributed scheduling of replicated tasks, however, violates this requirement if on-line scheduling, preemptive scheduling, or scheduling of dissimilar replicated task sets is employed. This problem of inconsistent task outputs has been solved previously by coordinating the decisions of the local schedulers such that replicated tasks are executed in an identical order. Global coordination results either in an extremely high communication effort to agree on each schedule decision or in an overly restrictive execution model where on-line scheduling, arbitrary preemptions, and nonidentically replicated task sets are not allowed. To overcome these restrictions, a new method, called timed messages, is introduced. Timed messages guarantee deterministic operation by presenting consistent message versions to the replicated tasks. This approach is based on simulated common knowledge and a sparse time base. Timed messages are very effective since they neither require communication between the local scheduler nor do they restrict usage of on-line flexible scheduling, preemptions and nonidentically replicated task sets

CiteSeerX

Crossref

White Rose Research Online

DataWarp: Building Applications which Make Progress in an Inconsistent World

Author: C. Szyperski
D.R. Jefferson
D.R. Jefferson
J. Snell
J.N. Gray
M. Wiesmann
N. Friedman
P. Henderson
Publication venue
Publication date: 01/01/2003
Field of study

The usual approach to dealing with imperfections in data is to attempt to eliminate them. However, the nature of modern systems means this is often futile. This paper describes an approach which permits applications to operate notwithstanding inconsistent data. Instead of attempting to extract a single, correct view of the world from its data, a DataWarp application constructs a collection of interpretations. It adopts one of these and continues work. Since it acts on assumptions, the DataWarp application considers its recent work to be provisional, expecting eventually most of these actions will become definitive. Should the application decide to adopt an alternative data view, it may then need to void provisional actions before resuming work. We describe the DataWarp architecture, discuss its implementation and describe an experiment in which a DataWarp application in an environment containing inconsistent data achieves better results than its conventional counterpart

Southampton (e-Prints Soton)

Crossref

LightChain: A DHT-based Blockchain for Resource Constrained Environments

Author: Hassanzadeh-Nazarabadi Yahya
Küpçü Alptekin
Özkasap Öznur
Publication venue
Publication date: 03/04/2019
Field of study

As an append-only distributed database, blockchain is utilized in a vast variety of applications including the cryptocurrency and Internet-of-Things (IoT). The existing blockchain solutions have downsides in communication and storage efficiency, convergence to centralization, and consistency problems. In this paper, we propose LightChain, which is the first blockchain architecture that operates over a Distributed Hash Table (DHT) of participating peers. LightChain is a permissionless blockchain that provides addressable blocks and transactions within the network, which makes them efficiently accessible by all the peers. Each block and transaction is replicated within the DHT of peers and is retrieved in an on-demand manner. Hence, peers in LightChain are not required to retrieve or keep the entire blockchain. LightChain is fair as all of the participating peers have a uniform chance of being involved in the consensus regardless of their influence such as hashing power or stake. LightChain provides a deterministic fork-resolving strategy as well as a blacklisting mechanism, and it is secure against colluding adversarial peers attacking the availability and integrity of the system. We provide mathematical analysis and experimental results on scenarios involving 10K nodes to demonstrate the security and fairness of LightChain. As we experimentally show in this paper, compared to the mainstream blockchains like Bitcoin and Ethereum, LightChain requires around 66 times less per node storage, and is around 380 times faster on bootstrapping a new node to the system, while each LightChain node is rewarded equally likely for participating in the protocol

arXiv.org e-Print Archive

Cryptology ePrint Archive

PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

Author: Didona Diego
Spirovska Kristina
Zwaenepoel Willy
Publication venue
Publication date: 25/02/2019
Field of study

Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design choice for building geo-replicated platforms, as it increases the storage capacity and reduces update propagation costs. This paper presents PaRiS, the first TCC system that supports partial replication and implements non-blocking parallel read operations, whose latency is paramount for the performance of read-intensive applications. PaRiS relies on a novel protocol to track dependencies, called Universal Stable Time (UST). By means of a lightweight background gossip process, UST identifies a snapshot of the data that has been installed by every DC in the system. Hence, transactions can consistently read from such a snapshot on any server in any replication site without having to block. Moreover, PaRiS requires only one timestamp to track dependencies and define transactional snapshots, thereby achieving resource efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment composed of up to 10 replication sites. We show that PaRiS scales well with the number of DCs and partitions, while being able to handle larger data-sets than existing solutions that assume full replication. We also demonstrate a performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x higher throughput with 5.91x lower latency for read-dominated workloads and up to 1.46x higher throughput with 20.56x lower latency for write-heavy workloads)

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Semiconductor manufacturing simulation design and analysis with limited data

Author: Biller Bahar
Dulgeroglu Onur
Gunes Corlu Canan
Hartig Michael
Olson Ronald J.
Sandvik Peter
Trant Gerald
Publication venue
Publication date: 02/08/2017
Field of study

This paper discusses simulation design and analysis for Silicon Carbide (SiC) manufacturing operations management at New York Power Electronics Manufacturing Consortium (PEMC) facility. Prior work has addressed the development of manufacturing system simulation as the decision support to solve the strategic equipment portfolio selection problem for the SiC fab design [1]. As we move into the phase of collecting data from the equipment purchased for the PEMC facility, we discuss how to redesign our manufacturing simulations and analyze their outputs to overcome the challenges that naturally arise in the presence of limited fab data. We conclude with insights on how an approach aimed to reflect learning from data can enable our discrete-event stochastic simulation to accurately estimate the performance measures for SiC manufacturing at the PEMC facility

Boston University Institutional Repository (OpenBU)