116 research outputs found

    Byzantine fault-tolerant agreement protocols for wireless Ad hoc networks

    Get PDF
    Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2010.The thesis investigates the problem of fault- and intrusion-tolerant consensus in resource-constrained wireless ad hoc networks. This is a fundamental problem in distributed computing because it abstracts the need to coordinate activities among various nodes. It has been shown to be a building block for several other important distributed computing problems like state-machine replication and atomic broadcast. The thesis begins by making a thorough performance assessment of existing intrusion-tolerant consensus protocols, which shows that the performance bottlenecks of current solutions are in part related to their system modeling assumptions. Based on these results, the communication failure model is identified as a model that simultaneously captures the reality of wireless ad hoc networks and allows the design of efficient protocols. Unfortunately, the model is subject to an impossibility result stating that there is no deterministic algorithm that allows n nodes to reach agreement if more than n2 omission transmission failures can occur in a communication step. This result is valid even under strict timing assumptions (i.e., a synchronous system). The thesis applies randomization techniques in increasingly weaker variants of this model, until an efficient intrusion-tolerant consensus protocol is achieved. The first variant simplifies the problem by restricting the number of nodes that may be at the source of a transmission failure at each communication step. An algorithm is designed that tolerates f dynamic nodes at the source of faulty transmissions in a system with a total of n 3f + 1 nodes. The second variant imposes no restrictions on the pattern of transmission failures. The proposed algorithm effectively circumvents the Santoro- Widmayer impossibility result for the first time. It allows k out of n nodes to decide despite dn 2 e(nk)+k2 omission failures per communication step. This algorithm also has the interesting property of guaranteeing safety during arbitrary periods of unrestricted message loss. The final variant shares the same properties of the previous one, but relaxes the model in the sense that the system is asynchronous and that a static subset of nodes may be malicious. The obtained algorithm, called Turquois, admits f < n 3 malicious nodes, and ensures progress in communication steps where dnf 2 e(n k f) + k 2. The algorithm is subject to a comparative performance evaluation against other intrusiontolerant protocols. The results show that, as the system scales, Turquois outperforms the other protocols by more than an order of magnitude.Esta tese investiga o problema do consenso tolerante a faltas acidentais e maliciosas em redes ad hoc sem fios. Trata-se de um problema fundamental que captura a essência da coordenação em actividades envolvendo vários nós de um sistema, sendo um bloco construtor de outros importantes problemas dos sistemas distribuídos como a replicação de máquina de estados ou a difusão atómica. A tese começa por efectuar uma avaliação de desempenho a protocolos tolerantes a intrusões já existentes na literatura. Os resultados mostram que as limitações de desempenho das soluções existentes estão em parte relacionadas com o seu modelo de sistema. Baseado nestes resultados, é identificado o modelo de falhas de comunicação como um modelo que simultaneamente permite capturar o ambiente das redes ad hoc sem fios e projectar protocolos eficientes. Todavia, o modelo é restrito por um resultado de impossibilidade que afirma não existir algoritmo algum que permita a n nós chegaram a acordo num sistema que admita mais do que n2 transmissões omissas num dado passo de comunicação. Este resultado é válido mesmo sob fortes hipóteses temporais (i.e., em sistemas síncronos) A tese aplica técnicas de aleatoriedade em variantes progressivamente mais fracas do modelo até ser alcançado um protocolo eficiente e tolerante a intrusões. A primeira variante do modelo, de forma a simplificar o problema, restringe o número de nós que estão na origem de transmissões faltosas. É apresentado um algoritmo que tolera f nós dinâmicos na origem de transmissões faltosas em sistemas com um total de n 3f + 1 nós. A segunda variante do modelo não impõe quaisquer restrições no padrão de transmissões faltosas. É apresentado um algoritmo que contorna efectivamente o resultado de impossibilidade Santoro-Widmayer pela primeira vez e que permite a k de n nós efectuarem progresso nos passos de comunicação em que o número de transmissões omissas seja dn 2 e(n k) + k 2. O algoritmo possui ainda a interessante propriedade de tolerar períodos arbitrários em que o número de transmissões omissas seja superior a . A última variante do modelo partilha das mesmas características da variante anterior, mas com pressupostos mais fracos sobre o sistema. Em particular, assume-se que o sistema é assíncrono e que um subconjunto estático dos nós pode ser malicioso. O algoritmo apresentado, denominado Turquois, admite f < n 3 nós maliciosos e assegura progresso nos passos de comunicação em que dnf 2 e(n k f) + k 2. O algoritmo é sujeito a uma análise de desempenho comparativa com outros protocolos na literatura. Os resultados demonstram que, à medida que o número de nós no sistema aumenta, o desempenho do protocolo Turquois ultrapassa os restantes em mais do que uma ordem de magnitude.FC

    Abstractions for Solving Consensus and Related Problems with Byzantine Faults

    Get PDF
    We become increasingly dependent on online services; therefore, their availability and correct behavior become increasingly important. Software replication is a popular technique for ensuring that computer systems continue to provide a correct service even when some of their components fail. By replicating a service on multiple servers, clients are guaranteed that even if some replica fails, the service is still available. At the core of software replication is the consensus problem, where a set of processes has to agree on a single value. A large number of consensus algorithms for different system models have been proposed. The most general system models (for which consensus is solvable) do not make strong assumptions on the synchrony (allow period of asynchrony) and assume that a subset of processes can fail completely arbitrarily (Byzantine faults). However, solving consensus in the presence of arbitrary faults and asynchrony is hard and demands sophisticated algorithms. Most of the existing consensus algorithms that deal with arbitrary faults are monolithic and developed from scratch, or by modifying existing algorithms in a non-modular manner. As a consequence, these algorithms are rather complex and hard to understand. We impute this complexity to the lack of adequate abstractions. The motivation of this thesis is suggesting abstractions that simplify the understanding of existing consensus algorithms with arbitrary faults and allow modular design of novel algorithms. The thesis also aims to clarify relations between consensus and the total-order broadcast problem in the presence of arbitrary faults. In the context of the consensus problem with arbitrary process faults, the literature distinguishes (1) authenticated Byzantine faults, where messages can be signed by the sending process, and (2) Byzantine faults, where there is no mechanism for signatures. Consensus protocols that assume Byzantine faults (without authentication) are harder to develop and prove correct than algorithms that consider authenticated Byzantine faults, even when they are based on the same idea. We propose an abstraction called weak interactive consistency (or WIC), that allows us to design consensus algorithms that can be instantiated into algorithms for authenticated Byzantine faults (signed messages) and algorithms for Byzantine faults. In other words, WIC unifies Byzantine consensus algorithms with and without signatures. This is illustrated on two seminal Byzantine consensus algorithms: the Castro-Liskov PBFT algorithm (no signatures) and the Martin-Alvisi FaB Paxos algorithms (signatures). WIC allows a very concise expression of these two algorithms. Furthermore, WIC turns out to be fundamental abstraction for solving consensus in the transmission fault model. The transmission fault model captures faults without blaming a specific component for the fault, and it is well-adapted to dynamic and transient faults. Using WIC we designed a consensus algorithm that overcomes limitations of all existing solutions to consensus in this model, which assume the synchronous system model, or require strong conditions for termination that exclude the case where all messages of a process can be corrupted. Then we go one step further in unifying consensus algorithms by proposing a generic consensus algorithm that highlights, through well chosen parameters, the core mechanisms of a number of well-known consensus algorithms including Paxos, OneThirdRule, PBFT and FaB Paxos. Interestingly, the generic algorithm allows us to identify a new Byzantine consensus algorithm that requires n > 4b, in-between the requirement n > 5b of FaB Paxos and n > 3b of PBFT (b is the maximum number of Byzantine processes). Afterwards, we study the relation between consensus and total-order broadcast in the presence of Byzantine faults. Total-order broadcast is defined for a set of processes, where each process can broadcast messages, with the guarantee that all processes in this set see the same sequence of messages. Among the several definitions of Byzantine consensus that differ only by their validity property, we identify those equivalent to total-order broadcast. We also give the first deterministic total-order broadcast reduction to consensus with constant time complexity with respect to consensus. Finally, we consider state-machine replication (SMR) with Byzantine faults. State-machine replication is a general approach for replicating services that can be modeled as a state machine. The key idea of this approach is to guarantee that all replicas start in the same state and then apply requests from clients in the same order, thereby guaranteeing that the replica states do not diverge. Recent studies has shown that most BFT-SMR algorithms do not actually perform well under performance attacks by Byzantine processes. We propose a new BFT-SMR algorithm, called BFT-Mencius, that guarantees, assuming a partially synchronous system model, that the latency of updates of correct processes is eventually upper-bounded, even under performance attacks by Byzantine processes. BFT-Mencius is a modular, signature-free algorithm based on a new communication primitive called Abortable Timely Announced Broadcast (ATAB). We evaluate the performance of BFT-Mencius in cluster settings, and show that it performs comparably to the state-of-the-art algorithms such as PBFT and Spinning in fault-free configurations and outperforms these algorithms under performance attacks by Byzantine processes

    Autonomic Vehicular Networks: Safety, Privacy, Cybersecurity and Societal Issues

    Get PDF
    Safety, efficiency, privacy, and cybersecurity can be achieved jointly in self-organizing networks of communicating vehicles of various automated driving levels. The underlying approach, solutions and novel results are briefly exposed. We explain why we are faced with a crucial choice regarding motorized society and cyber surveillance

    The Heard-Of model: computing in distributed systems with benign faults

    Get PDF
    Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas: (1) degree of synchrony and failure model are two independent parameters that determine a particular type of system, (2) the notion of faulty component is helpful and even necessary for the analysis of distributed computations when faults occur. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign faults: we present a computational model based only on the notion of transmission faults. In this model, computations evolve in rounds, and messages missed in a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that p "hears of" at round r (heard-of set), namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. We demonstrate how this approach leads to shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that the Consensus problem cannot be generally solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions

    Autonomic Vehicular Networks: Safety, Privacy, Cybersecurity and Societal Issues

    Get PDF
    International audienceSafety, efficiency, privacy, and cybersecurity can be achieved jointly in self-organizing networks of communicating vehicles of various automated driving levels. The underlying approach, solutions and novel results are briefly exposed. We explain why we are faced with a crucial choice regarding motorized society and cyber surveillance

    A rapid review on community connected microgrids

    Get PDF
    As the population of urban areas continues to grow, and construction of multi-unit developments surges in response, building energy use demand has increased accordingly and solutions are needed to offset electricity used from the grid. Renewable energy systems in the form of microgrids, and grid-connected solar PV-storage are considered primary solutions for powering residential developments. The primary objectives for commissioning such systems include significant electricity cost reductions and carbon emissions abatement. Despite the proliferation of renewables, the uptake of solar and battery storage systems in communities and multi-residential buildings are less researched in the literature, and many uncertainties remain in terms of providing an optimal solution. This literature review uses the rapid review technique, an industry and societal issue-based version of the systematic literature review, to identify the case for microgrids for multi-residential buildings and communities. The study describes the rapid review methodology in detail and discusses and examines the configurations and methodologies for microgrids

    Round-Based Consensus Algorithms, Predicate Implementations and Quantitative Analysis

    Get PDF
    Fault-tolerant computing is the art and science of building computer systems that continue to operate normally in the presence of faults. The fault tolerance field covers a wide spectrum of research area ranging from computer hardware to computer software. A common approach to obtain a fault-tolerant system is using software replication. However, maintaining the state of the replicas consistent is not an easy task, even though the understanding of the problems related to replication has significantly evolved over the past thirty years. Consensus is a fundamental building block to provide consistency in any fault-tolerant distributed system. A large number of algorithms have been proposed to solve the consensus problem in different systems. The efficiency of several consensus algorithms has been studied theoretically and practically. A common metric to evaluate the performance of consensus algorithms is the number of communication steps or the number of rounds (in round-based algorithms) for deciding. A large amount of improvements to consensus algorithms have been proposed to reduce this number under different assumptions, e.g., nice runs. However, the efficiency expressed in terms of number of rounds does not predict the time it takes to decide (including the time needed by the system to stabilize or not). Following this idea, the thesis investigates the round model abstraction to represent consensus algorithms, with benign and Byzantine faults, in a concise and modular way. The goal of the thesis is first to decouple the consensus algorithm from irrelevant details of implementations, such as synchronization, then study different possible implementations for a given consensus algorithm, and finally propose a more general analytical analysis for different consensus algorithms. The first part of the thesis considers the round-based consensus algorithms with benign faults. In this context, the round model allowed us to separate the consensus algorithms from the round implementation, to propose different round implementations, to improve existing round implementations by making them swift, and to provide quantitative analysis of different algorithms. The second part of the thesis considers the round-based consensus algorithms with Byzantine faults. In this context, there is a gap between theoretical consensus algorithms and practical Byzantine fault-tolerant protocols. The round model allowed us to fill the gap by better understanding existing protocols, and enabled us to express existing protocols in a simple and modular way, to obtain simplified proofs, to discover new protocols such as decentralized (non leader-based) algorithms, and finally to perform precise timing analysis to compare different algorithms. The last part of the thesis shows, as an example, how a round-based consensus algorithm that tolerates benign faults can be extended to wireless mobile ad hoc networks using an adequate communication layer. We have validated our implementation by running simulations in single hop and multi-hop wireless networks

    Contributions on agreement in dynamic distributed systems

    Get PDF
    139 p.This Ph.D. thesis studies the agreement problem in dynamic distributed systems by integrating both the classical fault-tolerance perspective and the more recent formalism based on evolving graphs. First, we developed a common framework that allows to analyze and compare models of dynamic distributed systems for eventual leader election. The framework extends a previous proposal by Baldoni et al. by including new dimensions and levels of dynamicity. Also, we extend the Time-Varying Graph (TVG) formalism by introducing the necessary timeliness assumptions and the minimal conditions to solve agreement problems. We provide a hierarchy of time-bounded, TVG-based, connectivity classes with increasingly stronger assumptions and specify an implementation of Terminating Reliable Broadcast for each class. Then we define an Omega failure detector, W, for the eventual leader election in dynamic distributed systems, together with a system model, , which is compatible with the timebounded TVG classes. We implement an algorithm that satisfy the properties of W in M. According to our common framework, M results to be weaker than the previous proposed dynamic distributed system models for eventual leader election. Additionally we use simulations to illustrate this fact and show that our leader election algorithm tolerates more general (i.e., dynamic) behaviors, and hence it is of application in a wider range of practical scenarios at the cost of a moderate overhead on stabilization times
    corecore