17 research outputs found

    Interactive Consistency in practical, mostly-asynchronous systems

    Full text link
    Interactive consistency is the problem in which n nodes, where up to t may be byzantine, each with its own private value, run an algorithm that allows all non-faulty nodes to infer the values of each other node. This problem is relevant to critical applications that rely on the combination of the opinions of multiple peers to provide a service. Examples include monitoring a content source to prevent equivocation or to track variability in the content provided, and resolving divergent state amongst the nodes of a distributed system. Previous works assume a fully synchronous system, where one can make strong assumptions such as negligible message delivery delays and/or detection of absent messages. However, practical, real-world systems are mostly asynchronous, i.e., they exhibit only some periods of synchrony during which message delivery is timely, thus requiring a different approach. In this paper, we present a thorough study on practical interactive consistency. We leverage the vast prior work on broadcast and byzantine consensus algorithms to design, implement and evaluate a set of algorithms, with varying timing assumptions and message complexity, that can be used to achieve interactive consistency in real-world distributed systems. We provide a complete, open-source implementation of each proposed interactive consistency algorithm by building a multi-layered stack of protocols that include several broadcast protocols, as well as a binary and a multi-valued consensus protocol. Most of these protocols have never been implemented and evaluated in a real system before. We analyze the performance of our suite of algorithms experimentally by engaging in both single instance and multiple parallel instances of each alternative.Comment: 13 pages, 10 figure

    Asynchronous Byzantine Consensus with 2f+1 Processes (extended version)

    Get PDF
    Reviewed by Paulo J. SousaByzantine consensus in asynchronous message-passing systems has been shown to require at least 3f+13f+1 processes to be solvable in several system models (e.g., with failure detectors, partial synchrony or randomization). Recently a couple of solutions to implement Byzantine fault-tolerant state-machine replication using only 2f+12f+1 replicas have appeared. This reduction from 3f+13f+1 to 2f+12f+1 is possible with a hybrid system model, i.e., by extending the system model with trusted/trustworthy components that constrain the power of faulty processes to have certain behaviors. Despite these important results, the problem of solving Byzantine consensus with only 2f+12f+1 processes is still far from being well understood. In this paper we present a methodology to transform crash consensus algorithms into Byzantine consensus algorithms with different characteristics, with the assistance of a reliable broadcast primitive that requires trusted/trustworthy components to be implemented. We exemplify the methodology with two algorithms, one that uses failure detectors and one that is randomized. We also define a new flavor of consensus and use it to solve atomic broadcast with only 2f+12f+1 processes, showing the practical interest of the consensus algorithms presented

    Every Bit Counts in Consensus

    Full text link
    Consensus enables n processes to agree on a common valid L-bit value, despite t < n/3 processes being faulty and acting arbitrarily. A long line of work has been dedicated to improving the worst-case communication complexity of consensus in partial synchrony. This has recently culminated in the worst-case word complexity of O(n^2). However, the worst-case bit complexity of the best solution is still O(n^2 L + n^2 kappa) (where kappa is the security parameter), far from the \Omega(n L + n^2) lower bound. The gap is significant given the practical use of consensus primitives, where values typically consist of batches of large size (L > n). This paper shows how to narrow the aforementioned gap while achieving optimal linear latency. Namely, we present a new algorithm, DARE (Disperse, Agree, REtrieve), that improves upon the O(n^2 L) term via a novel dispersal primitive. DARE achieves O(n^{1.5} L + n^{2.5} kappa) bit complexity, an effective sqrt{n}-factor improvement over the state-of-the-art (when L > n kappa). Moreover, we show that employing heavier cryptographic primitives, namely STARK proofs, allows us to devise DARE-Stark, a version of DARE which achieves the near-optimal bit complexity of O(n L + n^2 poly(kappa)). Both DARE and DARE-Stark achieve optimal O(n) latency

    Serviço de consenso genérico tolerante a intrusões para resolver problemas de acordo

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-graduação em Engenharia de Automação e Sistemas, Florianópolis, 2010Esta dissertação descreve uma extensão ao Serviço de Consenso proposto por Guerraoui e Schiper. O objetivo é prover uma forma padronizada para implementar protocolos de acordo tolerantes a faltas bizantinas usando um serviço tolerante a faltas de intrusão construídos sobre tecnologias de virtualização. Para isto, implementamos um Serviço Genérico de Consenso (SGC). SGC separa as especificidades de diferentes problemas de acordo do consenso de uma forma clara, utilizando uma interação cliente-servidor, permitindo total independência entre protocolos de consenso utilizados e especializações específicas ao problema. Será mostrado o funcionamento do SGC, suas propriedades e como utilizá-lo para para resolver alguns problemas de acordo

    Fault-tolerant control policies for multi-robot systems

    Get PDF
    Throughout the past decade, we have witnessed an active interest in distributed motion coordination algorithms for networked mobile autonomous robots. Often, in multi-robot systems, each robot executing a coordination task is a little cost, a disposable autonomous agent that has ad-hoc sensing or communication capability, and limited mobility. Coordination tasks that a group of multiple mobile robots might perform include formation control, rendezvous, distributed estimation, deployment, flocking, etc. Also, there are challenging tasks that are more suitable for a group of mobile robots than an individual robot, such as surveillance, exploration, or hazardous environmental monitoring. The field has been collectively investigated by many researchers in robotics, control, artificial intelligence, and distributed computing. However, relatively little work has been done on developing algorithms to provide resilience to failures that can occur. The problem is extremely difficult to handle in that any partial failure of a robot is not readily detectable. Some failures in robot resources can have an adverse effect on not only the performance of the robot itself, but also other robots, and the collective task performance as well. This study presents the development of fault-tolerant distributed control policies for multi-robot systems. We consider two problems: rendezvous and coverage. For the former, the goal is to bring all robots to a common location, while for the latter the goal is to deploy robots to achieve optimal coverage of an environment. We consider the case in which each robot is an autonomous decision maker that is anonymous (i.e., robots are indistinguishable to one another), memoryless (i.e., each robot makes decisions based upon only its current information), and dimensionless (i.e., collision checking is not considered). Each robot has a limited sensing range and can directly estimate the state of only those robots within that sensing range, which induces a network topology for the multi-robot system. We assume that it is not possible for the fault-free robots to identify the faulty robots (e.g., due to the anonymous property of the robots). For each problem, we provide an efficient computational framework and analysis of algorithms, all of which converge in the face of faulty robots under a few assumptions on the network topology and sensing abilities. A suite of experiments and simulations confirm our theoretical analysis and demonstrate that our proposed algorithms are useful in fault-prone multi-robot systems

    Round-Based Consensus Algorithms, Predicate Implementations and Quantitative Analysis

    Get PDF
    Fault-tolerant computing is the art and science of building computer systems that continue to operate normally in the presence of faults. The fault tolerance field covers a wide spectrum of research area ranging from computer hardware to computer software. A common approach to obtain a fault-tolerant system is using software replication. However, maintaining the state of the replicas consistent is not an easy task, even though the understanding of the problems related to replication has significantly evolved over the past thirty years. Consensus is a fundamental building block to provide consistency in any fault-tolerant distributed system. A large number of algorithms have been proposed to solve the consensus problem in different systems. The efficiency of several consensus algorithms has been studied theoretically and practically. A common metric to evaluate the performance of consensus algorithms is the number of communication steps or the number of rounds (in round-based algorithms) for deciding. A large amount of improvements to consensus algorithms have been proposed to reduce this number under different assumptions, e.g., nice runs. However, the efficiency expressed in terms of number of rounds does not predict the time it takes to decide (including the time needed by the system to stabilize or not). Following this idea, the thesis investigates the round model abstraction to represent consensus algorithms, with benign and Byzantine faults, in a concise and modular way. The goal of the thesis is first to decouple the consensus algorithm from irrelevant details of implementations, such as synchronization, then study different possible implementations for a given consensus algorithm, and finally propose a more general analytical analysis for different consensus algorithms. The first part of the thesis considers the round-based consensus algorithms with benign faults. In this context, the round model allowed us to separate the consensus algorithms from the round implementation, to propose different round implementations, to improve existing round implementations by making them swift, and to provide quantitative analysis of different algorithms. The second part of the thesis considers the round-based consensus algorithms with Byzantine faults. In this context, there is a gap between theoretical consensus algorithms and practical Byzantine fault-tolerant protocols. The round model allowed us to fill the gap by better understanding existing protocols, and enabled us to express existing protocols in a simple and modular way, to obtain simplified proofs, to discover new protocols such as decentralized (non leader-based) algorithms, and finally to perform precise timing analysis to compare different algorithms. The last part of the thesis shows, as an example, how a round-based consensus algorithm that tolerates benign faults can be extended to wireless mobile ad hoc networks using an adequate communication layer. We have validated our implementation by running simulations in single hop and multi-hop wireless networks

    Enhancing intrusion resilience in publicly accessible distributed systems

    Get PDF
    PhD ThesisThe internet is increasingly used as a means of communication by many businesses. Online shopping has become an important commercial activity and many governmental bodies offer services online. Malicious intrusion into these systems can have major negative consequences, both for the providers and users of these services. The need to protect against malicious intrusion, coupled with the difficulty of identifying and removing all possible vulnerabilities in a distributed system, have led to the use of systems that can tolerate intrusions with no loss of integrity. These systems require that services be replicated as deterministic state machines, a relatively hard task in practice, and do not ensure that confidentiality is maintained when one or more replicas are successfully intruded into. This thesis presents FORTRESS, a novel intrusion-resilient system that makes use of proactive obfuscation techniques and cheap off-the-shelf hardware to enhance intrusionresilience. FORTRESS uses proxies to prevent clients accessing servers directly, and regular replacement of proxies and servers with differently obfuscated versions. This maintains both confidentiality and integrity as long as an attacker does not compromise the system as a whole. The expected lifetime until system compromise of the FORTRESS system is compared to those of state machine replicated and primary backup systems when confronted with an attacker capable of launching distributed attacks against known vulnerabilities. Thus, FORTRESS is demonstrated to be a viable alternative to building intrusion-tolerant systems using deterministic state machine replication. The performance overhead of the FORTRESS system is also evaluated, using both a general state transfer framework for distributed systems, and a lightweight framework for large scale web applications. This shows the FORTRESS system has a sufficiently small performance overhead to be of practical use
    corecore