7 research outputs found

    Lifeguard: Local Health Awareness for More Accurate Failure Detection

    Full text link
    SWIM is a peer-to-peer group membership protocol with attractive scaling and robustness properties. However, slow message processing can cause SWIM to mark healthy members as failed (so called false positive failure detection), despite inclusion of a mechanism to avoid this. We identify the properties of SWIM that lead to the problem, and propose Lifeguard, a set of extensions to SWIM which consider that the local failure detector module may be at fault, via the concept of local health. We evaluate this approach in a precisely controlled environment and validate it in a real-world scenario, showing that it drastically reduces the rate of false positives. The false positive rate and detection time for true failures can be reduced simultaneously, compared to the baseline levels of SWIM

    On the Complexity of Asynchronous Gossip

    Get PDF
    In this paper, we study the complexity of gossip in an asynchronous, message-passing fault-prone distributed system. In short, we show that an adaptive adversary can significantly hamper the spreading of a rumor, while an oblivious adversary cannot. In the latter case, we present three randomized algorithms for achieving gossip, each offering a different trade-off between time and message complexity. We then show how to use these gossip algorithms to develop message-efficient asynchronous (randomized) consensus protocols

    Efficient epidemic-style protocols for reliable and scalable multicast

    No full text
    Epidemic-style (gossip-based) techniques have recently emerged as a scalable class of protocols for peer-to-peer reliable multicast dissemination in large process groups. These protocols provide probabilistic guarantees on reliability and scalability. However, popular implementations of epidemic-style dissemination are reputed to suffer from two major drawbacks: (a) (Network Overhead) when deployed on a WAN-wide or VPN-wide scale they generate a large number of packets that transit across the boundaries of multiple network domains (e.g., LANs, subnets, ASs), causing an overload on core network elements such as bridges, routers, and associated links; (b) (Lack of Adaptivity) they impose the same load on process group members and the network even under reduced failure rates (viz., packet losses, process failures). In this paper, we report on the (first) comprehensive set of solutions to these problems. The solution is comprised of two protocols: (1) a Hierarchical Gossiping protocol, and (2) an Adaptive multicast Dissemination Framework that allows use of any gossiping primitive within it. These protocols work within a virtual peer-to-peer hierarchy called the Leaf Box Hierarchy. Processes can be allocated in a topologically aware manner to the leaf boxes of this structure, so that (1) and (2) produce low traffic across domain boundaries in the network. In the interests of space, this paper focuses on a detailed discussion and evaluation (through simulations) of only the Hierarchical Gossiping protocol. We present an overview of the Adaptive Dissemination protocol and its properties. 1
    corecore