1,034 research outputs found
Exploiting the Synergy Between Gossiping and Structured Overlays
In this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale distributed systems. Despite the common goals, the two areas have, however, been relatively isolated. We focus on three problem domains where there is an untapped potential of using gossiping combined with SONs. We argue for applying gossip-based membership for ring-based SONs---such as Chord and Bamboo---to make them handle partition mergers and loopy networks. We argue that small world SONs---such as Accordion and Mercury---are specifically well-suited for gossip-based membership management. The benefits would be better graph-theoretic properties. Finally, we argue that gossip-based algorithms could use the overlay constructed by SONs. For example, many unreliable broadcast algorithms for SONs could be augmented with anti-entropy protocols. Similarly, gossip-based aggregation could be used in SONs for network size estimation and load-balancing purposes
Generic Platform for Failure Recovery in Survivable Trees
Failure recovery is a fundamental task of the dependable systems needed to achieve fault-tolerant communications, smooth operation of system components and a comfortable user interface. Tree topologies are fragile, yet they are quite popular structures in computer systems. The term survivable tree denotes the capability of the tree network to deliver messages even in the presence of failures. In this paper, we analyze the characteristics of large-scale overlay survivable trees and identify the requirements for general-purpose failure recovery mechanisms in such an environment. We outline a generic failure recovery platform for preplanned tree restoration which meets those requirements, and we focus primarily on its completeness and correctness properties. The platform is based on bypass rings and it uses a bypass routing algorithm to ensure completeness, and specialized leader election to guarantee correctness. The platform supports multiple, on-line and on-the-fly recovery, provides an optional level of fault-tolerance, protection selectivity and optimization capability. It is independent of the the protected tree type (regarding traffic direction, number of sources, etc.) and forms a basis for application-specific fragment reconnection.
Algorithms for Constructing Overlay Networks For Live Streaming
We present a polynomial time approximation algorithm for constructing an
overlay multicast network for streaming live media events over the Internet.
The class of overlay networks constructed by our algorithm include networks
used by Akamai Technologies to deliver live media events to a global audience
with high fidelity. We construct networks consisting of three stages of nodes.
The nodes in the first stage are the entry points that act as sources for the
live streams. Each source forwards each of its streams to one or more nodes in
the second stage that are called reflectors. A reflector can split an incoming
stream into multiple identical outgoing streams, which are then sent on to
nodes in the third and final stage that act as sinks and are located in edge
networks near end-users. As the packets in a stream travel from one stage to
the next, some of them may be lost. A sink combines the packets from multiple
instances of the same stream (by reordering packets and discarding duplicates)
to form a single instance of the stream with minimal loss. Our primary
contribution is an algorithm that constructs an overlay network that provably
satisfies capacity and reliability constraints to within a constant factor of
optimal, and minimizes cost to within a logarithmic factor of optimal. Further
in the common case where only the transmission costs are minimized, we show
that our algorithm produces a solution that has cost within a factor of 2 of
optimal. We also implement our algorithm and evaluate it on realistic traces
derived from Akamai's live streaming network. Our empirical results show that
our algorithm can be used to efficiently construct large-scale overlay networks
in practice with near-optimal cost
A Bypass-Ring Scheme for a Fault Tolerant Multicast
We present a fault tolerant scheme for recovery from single or multiple node failures in multi-directional multicast trees. The scheme is based on cyclic structures providing alternative paths to eliminate faulty nodes and reroute the traffic. Our scheme is independent of message source and direction in the tree, provides a basis for on-the-fly repair and can be used as a platform for various strategies for reconnecting tree partitions. It only requires an underlying infrastructure to provide a reliable routing service. Although it is described in the context of a message multicast, the scheme can be used universally in all systems using tree-based overlay networks for communication among components
FlexCast: genuine overlay-based atomic multicast
Atomic multicast is a communication abstraction where messages are propagated
to groups of processes with reliability and order guarantees. Atomic multicast
is at the core of strongly consistent storage and transactional systems. This
paper presents FlexCast, the first genuine overlay-based atomic multicast
protocol. Genuineness captures the essence of atomic multicast in that only the
sender of a message and the message's destinations coordinate to order the
message, leading to efficient protocols. Overlay-based protocols restrict how
process groups can communicate. Limiting communication leads to simpler
protocols and reduces the amount of information each process must keep about
the rest of the system. FlexCast implements genuine atomic multicast using a
complete DAG overlay. We experimentally evaluate FlexCast in a geographically
distributed environment using gTPC-C, a variation of the TPC-C benchmark that
takes into account geographical distribution and locality. We show that, by
exploiting genuineness and workload locality, FlexCast outperforms
well-established atomic multicast protocols without the inherent communication
overhead of state-of-the-art non-genuine multicast protocols
A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)
Many distributed systems work on a common shared state; in such systems,
distributed agreement is necessary for consistency. With an increasing number
of servers, these systems become more susceptible to single-server failures,
increasing the relevance of fault-tolerance. Atomic broadcast enables
fault-tolerant distributed agreement, yet it is costly to solve. Most practical
algorithms entail linear work per broadcast message. AllConcur -- a leaderless
approach -- reduces the work, by connecting the servers via a sparse resilient
overlay network; yet, this resiliency entails redundancy, limiting the
reduction of work. In this paper, we propose AllConcur+, an atomic broadcast
algorithm that lifts this limitation: During intervals with no failures, it
achieves minimal work by using a redundancy-free overlay network. When failures
do occur, it automatically recovers by switching to a resilient overlay
network. In our performance evaluation of non-failure scenarios, AllConcur+
achieves comparable throughput to AllGather -- a non-fault-tolerant distributed
agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in
terms of throughput and latency. Furthermore, our evaluation of failure
scenarios shows that AllConcur+'s expected performance is robust with regard to
occasional failures. Thus, for realistic use cases, leveraging redundancy-free
distributed agreement during intervals with no failures improves performance
significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables.
Modifications from previous version: extended the evaluation of AllConcur+
with a simulation of a multiple datacenters deploymen
Epidemic broadcast trees
There is an inherent trade-off between epidemic and deterministic tree-based broadcast primitives. Tree-based approaches have a small message complexity in steady-state but are very fragile in the presence of faults. Gossip, or epidemic, protocols have a higher message complexity but also offer much higher resilience. This paper proposes an integrated broadcast scheme that combines both approaches. We use a low cost scheme to build and maintain broadcast trees embedded on a gossip-based overlay. The protocol sends the message payload preferably via tree branches but uses the remaining links of the gossip overlay for fast recovery and expedite tree healing. Experimental evaluation presented in the paper shows that our new strategy has a low overhead and that is able to support large number of faults while maintaining a high reliability.This work was partially supported by project P-SON: Probabilistically Structured Overlay Networks (POSC/EIA/60941/2004)
- …