744 research outputs found
Incremental Consistency Guarantees for Replicated Objects
Programming with replicated objects is difficult. Developers must face the
fundamental trade-off between consistency and performance head on, while
struggling with the complexity of distributed storage stacks. We introduce
Correctables, a novel abstraction that hides most of this complexity, allowing
developers to focus on the task of balancing consistency and performance. To
aid developers with this task, Correctables provide incremental consistency
guarantees, which capture successive refinements on the result of an ongoing
operation on a replicated object. In short, applications receive both a
preliminary---fast, possibly inconsistent---result, as well as a
final---consistent---result that arrives later.
We show how to leverage incremental consistency guarantees by speculating on
preliminary values, trading throughput and bandwidth for improved latency. We
experiment with two popular storage systems (Cassandra and ZooKeeper) and three
applications: a Twissandra-based microblogging service, an ad serving system,
and a ticket selling system. Our evaluation on the Amazon EC2 platform with
YCSB workloads A, B, and C shows that we can reduce the latency of strongly
consistent operations by up to 40% (from 100ms to 60ms) at little cost (10%
bandwidth increase, 6% throughput drop) in the ad system. Even if the
preliminary result is frequently inconsistent (25% of accesses), incremental
consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear
Failure detectors as type boosters
The power of an object type T can be measured as the maximum number n of processes that can solve consensus using only objects of T and registers. This number, denoted cons(T), is called the consensus power of T. This paper addresses the question of the weakest failure detector to solve consensus among a number k > n of processes that communicate using shared objects of a type T with consensus power n. In other words, we seek for a failure detector that is sufficient and necessary to "boostâ the consensus power of a type T from n to k. It was shown in Neiger (Proceedings of the 14th annual ACM symposium on principles of distributed computing (PODC), pp. 100-109, 1995) that a certain failure detector, denoted Ω n , is sufficient to boost the power of a type T from n to k, and it was conjectured that Ω n was also necessary. In this paper, we prove this conjecture for one-shot deterministic types. We first show that, for any one-shot deterministic type T with cons(T) †n, Ω n is necessary to boost the power of T from n to n+1. Then we go a step further and show that Ω n is also the weakest to boost the power of (n+1)-ported one-shot deterministic types from n to any k > n. Our result generalizes, in a precise sense, the result of the weakest failure detector to solve consensus in asynchronous message-passing systems (Chandra etal. in J ACM 43(4):685-722, 1996). As a corollary, we show that Ω t is the weakest failure detector to boost the resilience level of a distributed shared memory system, i.e., to solve consensus among n > t processes using (t â 1)-resilient objects of consensus power
Fast and Robust Distributed Learning in High Dimension
Could a gradient aggregation rule (GAR) for distributed machine learning be
both robust and fast? This paper answers by the affirmative through
multi-Bulyan. Given workers, of which are arbitrary malicious
(Byzantine) and are not, we prove that multi-Bulyan can ensure a strong
form of Byzantine resilience, as well as an slowdown, compared
to averaging, the fastest (but non Byzantine resilient) rule for distributed
machine learning. When (almost all workers are correct),
multi-Bulyan reaches the speed of averaging. We also prove that multi-Bulyan's
cost in local computation is (like averaging), an important feature for
ML where commonly reaches , while robust alternatives have at least
quadratic cost in .
Our theoretical findings are complemented with an experimental evaluation
which, in addition to supporting the linear complexity argument, conveys
the fact that multi-Bulyan's parallelisability further adds to its efficiency.Comment: preliminary theoretical draft, complements the SysML 2019 practical
paper of which the code is provided at
https://github.com/LPD-EPFL/AggregaThor. arXiv admin note: text overlap with
arXiv:1703.0275
Collision-Free Pattern Formation
Shoals of small fishes can change their collective shape and form a specific pattern. They do so efficiently (in parallel) and without collision.
In this paper, we study the analog problem of distributed pattern formation. A set of processes needs to move from a set of initial positions to a set of final positions. The processes are oblivious (no internal memory) and must preserve, at any time, a minimal distance between them.
A naive solution would be to move the processes one by one, but this would take too long. The difficulty here is to move the processes simultaneously in clearly delimited phases, no matter how unfavorable the initial configuration may be. We solve this by treating the problem "dimension by dimension": the processes first form 1D trails, then gather into a 2D shape (this technique can be generalized to higher dimensions).
We present an optimal algorithm which time complexity depends linearly on the radius of the smallest circle containing both initial and final positions. The algorithm is self-stabilizing, as the processes are oblivious and the initial positions are arbitrary
The Alpha of Indulgent Consensus
This paper presents a simple framework unifying a family of consensus algorithms that can tolerate process crash failures and asynchronous periods of the network, also called indulgent consensus algorithms. Key to the framework is a new abstraction we introduce here, called Alpha, and which precisely captures consensus safety. Implementations of Alpha in shared memory, storage area network, message passing and active disk systems are presented, leading to directly derived consensus algorithms suited to these communication media. The paper also considers the case where the number of processes is unknown and can be arbitrarily larg
Anonymous and fault-tolerant shared-memory computing
The vast majority of papers on distributed computing assume that processes are assigned unique identifiers before computation begins. But is this assumption necessary? What if processes do not have unique identifiers or do not wish to divulge them for reasons of privacy? We consider asynchronous shared-memory systems that are anonymous. The shared memory contains only the most common type of shared objects, read/write registers. We investigate, for the first time, what can be implemented deterministically in this model when processes can fail. We give anonymous algorithms for some fundamental problems: time-stamping, snapshots and consensus. Our solutions to the first two are wait-free and the third is obstruction-free. We also show that a shared object has an obstruction-free implementation if and only if it satisfies a simple property called idempotence. To prove the sufficiency of this condition, we give a universal construction that implements any idempotent objec
Opacity: A Correctness Condition for Transactional Memory
Transactional memory is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on transactional memory implementations, however, its actual specification has never been precisely defined. This paper presents \emph{opacity}, a new correctness criterion for transactional memory systems. Opacity extends the notion of strict serializability, itself a strong form of the classical serializability property, with the requirement that even \emph{non-committed} transactions are prevented from accessing inconsistent state. Yet opacity does not preclude versioning, invisible reads and lazy updates, often used by modern TM implementations. In fact, most transactional memory systems we know of ensure opacity. We prove a tight bound on the inherent cost of implementing opacity. The bound highlights a trade-off that explains some of the differences between current transactional memory systems, and also draws a sharp complexity line between opacity on one hand, and the combination of strict serializability and strict recoverability on the other hand
On The Robustness of a Neural Network
With the development of neural networks based machine learning and their
usage in mission critical applications, voices are rising against the
\textit{black box} aspect of neural networks as it becomes crucial to
understand their limits and capabilities. With the rise of neuromorphic
hardware, it is even more critical to understand how a neural network, as a
distributed system, tolerates the failures of its computing nodes, neurons, and
its communication channels, synapses. Experimentally assessing the robustness
of neural networks involves the quixotic venture of testing all the possible
failures, on all the possible inputs, which ultimately hits a combinatorial
explosion for the first, and the impossibility to gather all the possible
inputs for the second.
In this paper, we prove an upper bound on the expected error of the output
when a subset of neurons crashes. This bound involves dependencies on the
network parameters that can be seen as being too pessimistic in the average
case. It involves a polynomial dependency on the Lipschitz coefficient of the
neurons activation function, and an exponential dependency on the depth of the
layer where a failure occurs. We back up our theoretical results with
experiments illustrating the extent to which our prediction matches the
dependencies between the network parameters and robustness. Our results show
that the robustness of neural networks to the average crash can be estimated
without the need to neither test the network on all failure configurations, nor
access the training set used to train the network, both of which are
practically impossible requirements.Comment: 36th IEEE International Symposium on Reliable Distributed Systems 26
- 29 September 2017. Hong Kong, Chin
- âŠ