137 research outputs found

    Total order broadcast for fault tolerant exascale systems

    Full text link
    In the process of designing a new fault tolerant run-time for future exascale systems, we discovered that a total order broadcast would be necessary. That is, nodes of a supercomputer should be able to broadcast messages to other nodes even in the face of failures. All messages should be seen in the same order at all nodes. While this is a well studied problem in distributed systems, few researchers have looked at how to perform total order broadcasts at large scales for data availability. Our experience implementing a published total order broadcast algorithm showed poor scalability at tens of nodes. In this paper we present a novel algorithm for total order broadcast which scales logarithmically in the number of processes and is not delayed by most process failures. While we are motivated by the needs of our run-time we believe this primitive is of general applicability. Total order broadcasts are used often in datacenter environments and as HPC developers begins to address fault tolerance at the application level we believe they will need similar primitives

    An analytical framework for the performance evaluation of proximity-aware structured overlays

    Get PDF
    In this paper, we present an analytical study of proximity-aware structured peer-to-peer networks under churn. We use a master-equation-based approach, which is used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. In earlier work we have demonstrated that this methodology is in fact also well suited to describing structured overlay networks under churn, by showing how we can accurately predict the average number of hops taken by a lookup, for any value of churn, for the Chord system. In this paper, we extend the analysis so as to also be able to predict lookup latency, given an average latency for the links in the network. Our results show that there exists a region in the parameter space of the model, depending on churn, the number of nodes, the maintenance rates and the delays in the network, when the network cannot function as a small world graph anymore, due to the farthest connections of a node always being wrong or dead. We also demonstrate how it is possible to analyse proximity neighbour selection or proximity route selection within this formalism

    Broadcasting in Noisy Radio Networks

    Full text link
    The widely-studied radio network model [Chlamtac and Kutten, 1985] is a graph-based description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node vv receives a message from a neighbor if and only if exactly one of its neighbors broadcasts. We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter p∈[0,1)p \in [0,1), either every sender has probability pp of transmitting noise or every receiver of a single transmission in its neighborhood has probability pp of receiving noise. We first study single-message broadcast algorithms in noisy radio networks and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multi-message broadcast algorithms. We next investigate the extent to which (network) coding improves throughput in noisy radio networks. We address the previously perplexing result of Alon et al. 2014 that worst case coding throughput is no better than worst case routing throughput up to constants: we show that the worst case throughput performance of coding is, in fact, superior to that of routing -- by a Θ(log⁥(n))\Theta(\log(n)) gap -- provided receiver faults are introduced. However, we show that any coding or routing scheme for the noiseless setting can be transformed to be robust to sender faults with only a constant throughput overhead. These transformations imply that the results of Alon et al., 2014 carry over to noisy radio networks with sender faults.Comment: Principles of Distributed Computing 201

    Necessary and sufficient conditions for 1-adaptivity

    Full text link

    Developing Libraries Using Software Transactional Memory

    Get PDF
    Software transactional memory is a promising programming model that adapts many concepts borrowed from the databases world to control concurrent accesses to main memory (RAM) locations. This paper discusses how to support apparently irreversible operations, such as memory allocation and deallocation, within software libraries that will be used in (software memory) transactional contexts, and propose a generic and elegant approach based on a handler system, which provide the means to create and execute compensation actions at key moments during the life-time of a transaction.This work was partially supported by Sun Micro-Systems and Sun Microsystems Portugal under the “Sun Worldwide Marketing Loaner Agreement #11497”, by the CITI–Centro de InformĂĄtica e Tecnologias da Informação and by the FCT/MCTES–Fundação para a CiĂȘncia e Tecnologia in the context of the Byzatium research project PTDC/EIA/74325/2006 and research grant SFRH/BD/41765/2007

    On the Complexity of Local Graph Transformations

    Get PDF
    We consider the problem of transforming a given graph G_s into a desired graph G_t by applying a minimum number of primitives from a particular set of local graph transformation primitives. These primitives are local in the sense that each node can apply them based on local knowledge and by affecting only its 1-neighborhood. Although the specific set of primitives we consider makes it possible to transform any (weakly) connected graph into any other (weakly) connected graph consisting of the same nodes, they cannot disconnect the graph or introduce new nodes into the graph, making them ideal in the context of supervised overlay network transformations. We prove that computing a minimum sequence of primitive applications (even centralized) for arbitrary G_s and G_t is NP-hard, which we conjecture to hold for any set of local graph transformation primitives satisfying the aforementioned properties. On the other hand, we show that this problem admits a polynomial time algorithm with a constant approximation ratio

    Automatic Reconfiguration for Large-Scale Reliable Storage Systems

    Get PDF
    Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low

    SoK: A Consensus Taxonomy in the Blockchain Era

    Get PDF
    Consensus (a.k.a. Byzantine agreement) is arguably one of the most fundamental problems in distributed systems, playing also an important role in the area of cryptographic protocols as the enabler of a (secure) broadcast functionality. While the problem has a long and rich history and has been analyzed from many different perspectives, recently, with the advent of blockchain protocols like Bitcoin, it has experienced renewed interest from a much wider community of researchers and has seen its application expand to various novel settings. One of the main issues in consensus research is the many different variants of the problem that exist as well as the various ways the problem behaves when different setup, computational assumptions and network models are considered. In this work we perform a systematization of knowledge in the landscape of consensus research starting with the original formulation in the early 1980s up to the present blockchain-based new class of consensus protocols. Our work is a roadmap for studying the consensus problem under its many guises, classifying the way it operates in many settings and highlighting the exciting new applications that have emerged in the blockchain era

    On the Tree Conjecture for the Network Creation Game

    Get PDF
    Selfish Network Creation focuses on modeling real world networks from a game-theoretic point of view. One of the classic models by Fabrikant et al.[PODC\u2703] is the network creation game, where agents correspond to nodes in a network which buy incident edges for the price of alpha per edge to minimize their total distance to all other nodes. The model is well-studied but still has intriguing open problems. The most famous conjectures state that the price of anarchy is constant for all alpha and that for alpha >= n all equilibrium networks are trees. We introduce a novel technique for analyzing stable networks for high edge-price alpha and employ it to improve on the best known bounds for both conjectures. In particular we show that for alpha > 4n-13 all equilibrium networks must be trees, which implies a constant price of anarchy for this range of alpha. Moreover, we also improve the constant upper bound on the price of anarchy for equilibrium trees
    • 

    corecore