744 research outputs found

    Incremental Consistency Guarantees for Replicated Objects

    Get PDF
    Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

    Failure detectors as type boosters

    Get PDF
    The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to "boost” the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100-109, 1995) that a certain failure detector, denoted Ω n , is sufficient to boost the power of a type T from n to k, and it was conjectured that Ω n was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) ≀ n, Ω n is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ω n is also the weakest to boost the power of (n+1)-ported one-shot deterministic types from n to any k > n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra etal. in J ACM 43(4):685-722, 1996). As a corollary, we show that Ω t is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t − 1)-resilient objects of consensus power

    Fast and Robust Distributed Learning in High Dimension

    Full text link
    Could a gradient aggregation rule (GAR) for distributed machine learning be both robust and fast? This paper answers by the affirmative through multi-Bulyan. Given nn workers, ff of which are arbitrary malicious (Byzantine) and m=n−fm=n-f are not, we prove that multi-Bulyan can ensure a strong form of Byzantine resilience, as well as an mn{\frac{m}{n}} slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning. When m≈nm \approx n (almost all workers are correct), multi-Bulyan reaches the speed of averaging. We also prove that multi-Bulyan's cost in local computation is O(d)O(d) (like averaging), an important feature for ML where dd commonly reaches 10910^9, while robust alternatives have at least quadratic cost in dd. Our theoretical findings are complemented with an experimental evaluation which, in addition to supporting the linear O(d)O(d) complexity argument, conveys the fact that multi-Bulyan's parallelisability further adds to its efficiency.Comment: preliminary theoretical draft, complements the SysML 2019 practical paper of which the code is provided at https://github.com/LPD-EPFL/AggregaThor. arXiv admin note: text overlap with arXiv:1703.0275

    Collision-Free Pattern Formation

    Get PDF
    Shoals of small fishes can change their collective shape and form a specific pattern. They do so efficiently (in parallel) and without collision. In this paper, we study the analog problem of distributed pattern formation. A set of processes needs to move from a set of initial positions to a set of final positions. The processes are oblivious (no internal memory) and must preserve, at any time, a minimal distance between them. A naive solution would be to move the processes one by one, but this would take too long. The difficulty here is to move the processes simultaneously in clearly delimited phases, no matter how unfavorable the initial configuration may be. We solve this by treating the problem "dimension by dimension": the processes first form 1D trails, then gather into a 2D shape (this technique can be generalized to higher dimensions). We present an optimal algorithm which time complexity depends linearly on the radius of the smallest circle containing both initial and final positions. The algorithm is self-stabilizing, as the processes are oblivious and the initial positions are arbitrary

    The Alpha of Indulgent Consensus

    Get PDF
    This paper presents a simple framework unifying a family of consensus algorithms that can tolerate process crash failures and asynchronous periods of the network, also called indulgent consensus algorithms. Key to the framework is a new abstraction we introduce here, called Alpha, and which precisely captures consensus safety. Implementations of Alpha in shared memory, storage area network, message passing and active disk systems are presented, leading to directly derived consensus algorithms suited to these communication media. The paper also considers the case where the number of processes is unknown and can be arbitrarily larg

    Anonymous and fault-tolerant shared-memory computing

    Get PDF
    The vast majority of papers on distributed computing assume that processes are assigned unique identifiers before computation begins. But is this assumption necessary? What if processes do not have unique identifiers or do not wish to divulge them for reasons of privacy? We consider asynchronous shared-memory systems that are anonymous. The shared memory contains only the most common type of shared objects, read/write registers. We investigate, for the first time, what can be implemented deterministically in this model when processes can fail. We give anonymous algorithms for some fundamental problems: time-stamping, snapshots and consensus. Our solutions to the first two are wait-free and the third is obstruction-free. We also show that a shared object has an obstruction-free implementation if and only if it satisfies a simple property called idempotence. To prove the sufficiency of this condition, we give a universal construction that implements any idempotent objec

    Opacity: A Correctness Condition for Transactional Memory

    Get PDF
    Transactional memory is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on transactional memory implementations, however, its actual specification has never been precisely defined. This paper presents \emph{opacity}, a new correctness criterion for transactional memory systems. Opacity extends the notion of strict serializability, itself a strong form of the classical serializability property, with the requirement that even \emph{non-committed} transactions are prevented from accessing inconsistent state. Yet opacity does not preclude versioning, invisible reads and lazy updates, often used by modern TM implementations. In fact, most transactional memory systems we know of ensure opacity. We prove a tight bound on the inherent cost of implementing opacity. The bound highlights a trade-off that explains some of the differences between current transactional memory systems, and also draws a sharp complexity line between opacity on one hand, and the combination of strict serializability and strict recoverability on the other hand

    On The Robustness of a Neural Network

    Get PDF
    With the development of neural networks based machine learning and their usage in mission critical applications, voices are rising against the \textit{black box} aspect of neural networks as it becomes crucial to understand their limits and capabilities. With the rise of neuromorphic hardware, it is even more critical to understand how a neural network, as a distributed system, tolerates the failures of its computing nodes, neurons, and its communication channels, synapses. Experimentally assessing the robustness of neural networks involves the quixotic venture of testing all the possible failures, on all the possible inputs, which ultimately hits a combinatorial explosion for the first, and the impossibility to gather all the possible inputs for the second. In this paper, we prove an upper bound on the expected error of the output when a subset of neurons crashes. This bound involves dependencies on the network parameters that can be seen as being too pessimistic in the average case. It involves a polynomial dependency on the Lipschitz coefficient of the neurons activation function, and an exponential dependency on the depth of the layer where a failure occurs. We back up our theoretical results with experiments illustrating the extent to which our prediction matches the dependencies between the network parameters and robustness. Our results show that the robustness of neural networks to the average crash can be estimated without the need to neither test the network on all failure configurations, nor access the training set used to train the network, both of which are practically impossible requirements.Comment: 36th IEEE International Symposium on Reliable Distributed Systems 26 - 29 September 2017. Hong Kong, Chin
    • 

    corecore