6,858 research outputs found

    A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)

    Full text link
    Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables. Modifications from previous version: extended the evaluation of AllConcur+ with a simulation of a multiple datacenters deploymen

    Effective Edge-Fault-Tolerant Single-Source Spanners via Best (or Good) Swap Edges

    Full text link
    Computing \emph{all best swap edges} (ABSE) of a spanning tree TT of a given nn-vertex and mm-edge undirected and weighted graph GG means to select, for each edge ee of TT, a corresponding non-tree edge ff, in such a way that the tree obtained by replacing ee with ff enjoys some optimality criterion (which is naturally defined according to some objective function originally addressed by TT). Solving efficiently an ABSE problem is by now a classic algorithmic issue, since it conveys a very successful way of coping with a (transient) \emph{edge failure} in tree-based communication networks: just replace the failing edge with its respective swap edge, so as that the connectivity is promptly reestablished by minimizing the rerouting and set-up costs. In this paper, we solve the ABSE problem for the case in which TT is a \emph{single-source shortest-path tree} of GG, and our two selected swap criteria aim to minimize either the \emph{maximum} or the \emph{average stretch} in the swap tree of all the paths emanating from the source. Having these criteria in mind, the obtained structures can then be reviewed as \emph{edge-fault-tolerant single-source spanners}. For them, we propose two efficient algorithms running in O(mn+n2logn)O(m n +n^2 \log n) and O(mnlogα(m,n))O(m n \log \alpha(m,n)) time, respectively, and we show that the guaranteed (either maximum or average, respectively) stretch factor is equal to 3, and this is tight. Moreover, for the maximum stretch, we also propose an almost linear O(mlogα(m,n))O(m \log \alpha(m,n)) time algorithm computing a set of \emph{good} swap edges, each of which will guarantee a relative approximation factor on the maximum stretch of 3/23/2 (tight) as opposed to that provided by the corresponding BSE. Surprisingly, no previous results were known for these two very natural swap problems.Comment: 15 pages, 4 figures, SIROCCO 201

    Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems

    Full text link
    The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282

    Optimal Gossip with Direct Addressing

    Full text link
    Gossip algorithms spread information by having nodes repeatedly forward information to a few random contacts. By their very nature, gossip algorithms tend to be distributed and fault tolerant. If done right, they can also be fast and message-efficient. A common model for gossip communication is the random phone call model, in which in each synchronous round each node can PUSH or PULL information to or from a random other node. For example, Karp et al. [FOCS 2000] gave algorithms in this model that spread a message to all nodes in Θ(logn)\Theta(\log n) rounds while sending only O(loglogn)O(\log \log n) messages per node on average. Recently, Avin and Els\"asser [DISC 2013], studied the random phone call model with the natural and commonly used assumption of direct addressing. Direct addressing allows nodes to directly contact nodes whose ID (e.g., IP address) was learned before. They show that in this setting, one can "break the logn\log n barrier" and achieve a gossip algorithm running in O(logn)O(\sqrt{\log n}) rounds, albeit while using O(logn)O(\sqrt{\log n}) messages per node. We study the same model and give a simple gossip algorithm which spreads a message in only O(loglogn)O(\log \log n) rounds. We also prove a matching Ω(loglogn)\Omega(\log \log n) lower bound which shows that this running time is best possible. In particular we show that any gossip algorithm takes with high probability at least 0.99loglogn0.99 \log \log n rounds to terminate. Lastly, our algorithm can be tweaked to send only O(1)O(1) messages per node on average with only O(logn)O(\log n) bits per message. Our algorithm therefore simultaneously achieves the optimal round-, message-, and bit-complexity for this setting. As all prior gossip algorithms, our algorithm is also robust against failures. In particular, if in the beginning an oblivious adversary fails any FF nodes our algorithm still, with high probability, informs all but o(F)o(F) surviving nodes
    corecore