220 research outputs found

    Tolerating permanent and transient value faults

    Get PDF
    Transmission faults allow us to reason about permanent and transient value faults in a uniform way. However, all existing solutions to consensus in this model are either in the synchronous system, or require strong conditions for termination, that exclude the case where all messages of a process can be corrupted. In this paper we introduce eventual consistency in order to overcome this limitation. Eventual consistency denotes the existence of rounds in which processes receive the same set of messages. We show how eventually consistent rounds can be simulated from eventually synchronous rounds, and how eventually consistent rounds can be used to solve consensus. Depending on the nature and number of permanent and transient transmission faults, we obtain different conditions on , the number of processes, in order to solve consensus in our weak model

    Generic construction of consensus algorithms for benign and Byzantine faults

    Get PDF
    The paper proposes a generic consensus algorithm that highlights the basic and common features of known consensus algorithms. The parameters of the generic algorithm encapsulate the core differences between various consensus algorithms, including leader-based and leader-free algorithms, addressing benign faults, authenticated Byzantine faults and Byzantine faults. This leads to the identification of three classes of consensus algorithms. With the proposed classification, Paxos and PBFT indeed belong to the same class, while FaB Paxos belongs to a different class. Interestingly, the classification allowed us to identify a new Byzantine consensus algorithm that requires n>4b, where b is the maximum number of Byzantine processes

    Time-complexity bounds on agreement problems

    Get PDF
    In many distributed systems, designing an application that maintains consistency and availability despite failure of processes, involves solving some form of agreement. Not surprisingly, providing efficient agreement algorithms is critical for improving the performance of many distributed applications. This thesis studies how fast we can solve fundamental agreement problems like consensus, uniform consensus, and non-blocking atomic commit. In an agreement problem, the processes are supposed to propose a value and eventually decide on a common value that depends on the proposed values. To study agreement problems, we consider two round-based message-passing models, the wellknown synchronous model, and the eventually synchronous model. The second model is a partially synchronous model that remains asynchronous for an arbitrary number of rounds but eventually becomes synchronous. We investigate two aspects of the performance of agreement algorithms. We first measure time-complexity using a finer-grained metric than what was considered so far in the literature. Then we optimize algorithms for subsets of executions that are considered to be common in practice. Traditionally, the performance of agreement algorithms was measured in terms of global decision: the number of rounds required for all correct (non-faulty) processes to decide. However, in many settings, upon deciding, any correct process can provide the decision value to the process that is waiting for a decision. In this case, a more suitable performance metric is a local decision: the number of rounds required for at least one correct process to decide. We present tight bounds for local decisions in the synchronous and the eventually synchronous models. We also show that considering the local decision metric allows us to uncover fundamental differences between agreement problems, and between models, that were not apparent with previous metrics. In the eventually synchronous model, we observe that, for many cases in practice, executions are frequently synchronous and only occasionally asynchronous. Thus we optimize algorithms for synchronous executions, and give matching lower bounds. We show that, in some sense, synchronous executions of algorithms designed for the eventually synchronous model are slower than executions of algorithms directly designed for the synchronous model, i.e., there is an inherent price associated with tolerating arbitrary periods of asynchrony. Finally, we establish a tight bound on the number of rounds required to reach agreement once an execution becomes synchronous and no new failures occur

    Byzantine state machine replication for the masses

    Get PDF
    Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2018The state machine replication technique is a popular approach for building Byzantine fault-tolerant services. However, despite the widespread adoption of this paradigm for crash fault-tolerant systems, there are still few examples of this paradigm for real Byzantine fault-tolerant systems. Our view of this situation is that there is a lack of robust implementations of Byzantine fault-tolerant state machine replication middleware, and that the performance penalty is too high, specially for geo-replication. These hindrances are tightly coupled to the distributed protocols used for enforcing such resilience. This thesis has the objective of finding methodologies for enhancing robustness and performance of state machine replication systems. The first contribution is Mod-SMaRt, a modular protocol that preserves optimal latency in terms of the communications steps exchanged among processes. By being a modular protocol, it becomes simpler to validate and implement, thus resulting in greater robustness; by also preserving optimal message-exchanges among processes, the protocol is capable of delivering desirable performance. The second contribution is concerned with implementing Mod-SMaRt into BFTSMART, a reliable and high-performance codebase that was maintained and improved over the entire course of the PhD that offers multicore-awareness, reconfiguration support, and a flexible API. The third contribution presents WHEAT, a protocol derived from Mod-SMaRt that uses optimizations shown to be effective in reducing latency via a practical evaluation conducted in a geo distributed environment. We additionally conducted an evaluation of both BFT-SMART and WHEAT applied to a relational database middleware and an ordering service for a permissioned blockchain platform. These evaluations revealed encouraging results for both systems and validated our work conducted in the geo-distributed context.A técnica de replicação máquina de estados é um paradigma popular usado em vários sistemas distribuídos modernos. No entanto, apesar da adoção deste paradigma em sistemas reais tolerantes a faltas por paragem, ainda existem poucos exemplos de sistemas reais tolerantes a faltas bizantinas. Segundo a nossa experiência nesta área de investigação, isto deve-se ao fato de existirem poucas concretizações robustas para replicação máquina de estados tolerante a faltas bizantinas, assim como uma perda de desempenho demasiado elevada, especialmente em ambientes geo-replicados. A razão fundamental para a existência destes obstáculos vem dos protocolos distribuídos necessários para assegurar este tipo de resiliência. Esta tese tem como objetivo explorar metodologias para a robustez e eficiência da replicação máquina de estados. A primeira contribuição da tese é o algoritmo Mod-SMaRt, um protocolo modular que preserva latência ótima em termos de passos de comunicação executados pelos processos. Sendo um protocolo modular, torna-se mais simples de validar e concretizar, o que resulta em maior robustez; ao preservar troca de mensagens ótima entre processos, também é capaz de entregar um desempenho desejável. A segunda contribuição consiste em concretizar o protocolo Mod SMaRt na ferramenta BFT-SMART, uma biblioteca fiável de alto desempenho, mantida e melhorada ao longo de todo o período correspondente ao doutoramento, capaz de suportar arquiteturas multi-núcleo, reconfiguração do grupo de réplicas, e uma API de programação flexível. A terceira contribuição consiste em um protocolo derivado do Mod-SMaRt designado WHEAT, que usa otimizações que demostraram serem eficientes na redução da latência segundo uma avaliação prática em ambiente geo-replicado. Adicionalmente, foram também realizadas avaliações de ambos os protocolos quando aplicados num middleware para base de dados relacionais, e num serviço de ordenação para uma plataforma blockchain. Ambas as avaliações revelam resultados encorajadores para ambos os sistemas e validam o trabalho realizado em contexto geo-distribuído.Projeto IRCoC (PTDC/EEI-SCR/6970/2014); Comissão Europeia, FP7 (Seventh Framework Programme for Research and Technological Development), projetos FP7/2007-2013, ICT-25724

    The Heard-Of model: computing in distributed systems with benign faults

    Get PDF
    Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas: (1) degree of synchrony and failure model are two independent parameters that determine a particular type of system, (2) the notion of faulty component is helpful and even necessary for the analysis of distributed computations when faults occur. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign faults: we present a computational model based only on the notion of transmission faults. In this model, computations evolve in rounds, and messages missed in a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that p "hears of" at round r (heard-of set), namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. We demonstrate how this approach leads to shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that the Consensus problem cannot be generally solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions
    corecore