26 research outputs found
Distributed Multi-writer Multi-reader Atomic Register with Optimistically Fast Read and Write
A distributed multi-writer multi-reader (MWMR) atomic register is an
important primitive that enables a wide range of distributed algorithms. Hence,
improving its performance can have large-scale consequences. Since the seminal
work of ABD emulation in the message-passing networks [JACM '95], many
researchers study fast implementations of atomic registers under various
conditions. "Fast" means that a read or a write can be completed with 1
round-trip time (RTT), by contacting a simple majority. In this work, we
explore an atomic register with optimal resilience and "optimistically fast"
read and write operations. That is, both operations can be fast if there is no
concurrent write.
This paper has three contributions: (i) We present Gus, the emulation of an
MWMR atomic register with optimal resilience and optimistically fast reads and
writes when there are up to 5 nodes; (ii) We show that when there are > 5
nodes, it is impossible to emulate an MWMR atomic register with both
properties; and (iii) We implement Gus in the framework of EPaxos and Gryff,
and show that Gus provides lower tail latency than state-of-the-art systems
such as EPaxos, Gryff, Giza, and Tempo under various workloads in the context
of geo-replicated object storage systems
Population stability: regulating size in the presence of an adversary
We introduce a new coordination problem in distributed computing that we call
the population stability problem. A system of agents each with limited memory
and communication, as well as the ability to replicate and self-destruct, is
subjected to attacks by a worst-case adversary that can at a bounded rate (1)
delete agents chosen arbitrarily and (2) insert additional agents with
arbitrary initial state into the system. The goal is perpetually to maintain a
population whose size is within a constant factor of the target size . The
problem is inspired by the ability of complex biological systems composed of a
multitude of memory-limited individual cells to maintain a stable population
size in an adverse environment. Such biological mechanisms allow organisms to
heal after trauma or to recover from excessive cell proliferation caused by
inflammation, disease, or normal development.
We present a population stability protocol in a communication model that is a
synchronous variant of the population model of Angluin et al. In each round,
pairs of agents selected at random meet and exchange messages, where at least a
constant fraction of agents is matched in each round. Our protocol uses
three-bit messages and states per agent. We emphasize that
our protocol can handle an adversary that can both insert and delete agents, a
setting in which existing approximate counting techniques do not seem to apply.
The protocol relies on a novel coloring strategy in which the population size
is encoded in the variance of the distribution of colors. Individual agents can
locally obtain a weak estimate of the population size by sampling from the
distribution, and make individual decisions that robustly maintain a stable
global population size
D1.3 - SUPERCLOUD Architecture Implementation
In this document we describe the implementation of the SUPERCLOUD architecture. The architecture provides an abstraction layer on top of which SUPERCLOUD users can realize SUPERCLOUD services encompassing secure computation workloads, secure and privacy-preserving resilient data storage and secure networking resources spanning across different cloud service providers' computation, data storage and network resources. The components of the SUPERCLOUD architecture implementation are described. Integration between the different layers of the architecture (computing security, data protection, network security) and with the facilities for security self-management is also highlighted. Finally, we provide download and installation instructions for the released software components that can be downloaded from our common SUPERCLOUD code repository
Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks
We study the problem of privately emulating shared memory in message-passing networks. The system includes clients that store and retrieve replicated information on N servers, out of which e are data-corrupting malicious. When a client accesses a data-corrupting malicious server, the data field of that server response might be different from the value it originally stored. However, all other control variables in the server reply and protocol actions are according to the server algorithm. For the coded atomic storage algorithms by Cadambe et al., we present an enhancement that ensures no information leakage and data-corrupting malicious fault-tolerance. We also consider recovery after the occurrence of transient faults that violate the assumptions according to which the system was designed to operate. After their last occurrence, transient faults leave the system in an arbitrary state (while the program code stays intact). We present a self-stabilizing algorithm, which recovers after the occurrence of transient faults. This addition to Cadambe et al. considers asynchronous settings as long as no transient faults occur. The recovery from transient faults that bring the system counters (close) to their maximal values may include the use of a global reset procedure, which requires the system run to be controlled by a fair scheduler. After the recovery period, the safety properties are provided for asynchronous system runs that are not necessarily controlled by fair schedulers. Since the recovery period is bounded and the occurrence of transient faults is extremely rare, we call this design criteria self-stabilization in the presence of seldom fairness. Our self-stabilizing algorithm uses a bounded amount of storage during asynchronous executions (that are not necessarily controlled by fair schedulers). To the best of our knowledge, we are the first to address privacy, data-corrupting malicious behavior, and self-stabilization in the context of emulating atomic shared memory in message-passing systems
Topological Characterization of Consensus Solvability in Directed Dynamic Networks
Consensus is one of the most fundamental problems in distributed computing.
This paper studies the consensus problem in a synchronous dynamic directed
network, in which communication is controlled by an oblivious message
adversary. The question when consensus is possible in this model has already
been studied thoroughly in the literature from a combinatorial perspective, and
is known to be challenging. This paper presents a topological perspective on
consensus solvability under oblivious message adversaries, which provides
interesting new insights. Our main contribution is a topological
characterization of consensus solvability, which also leads to explicit
decision procedures. Our approach is based on the novel notion of a
communication pseudosphere, which can be seen as the message-passing analog of
the well-known standard chromatic subdivision for wait-free shared memory
systems. We further push the elegance and expressiveness of the "geometric"
reasoning enabled by the topological approach by dealing with uninterpreted
complexes, which considerably reduce the size of the protocol complex, and by
labeling facets with information flow arrows, which give an intuitive meaning
to the implicit epistemic status of the faces in a protocol complex
Fault-Tolerant Distributed Services in Message-Passing Systems
Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation:
• Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages.
• Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable.
We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems.
• We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown