1,147 research outputs found

    Byzantine fault-tolerant agreement protocols for wireless Ad hoc networks

    Get PDF
    Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2010.The thesis investigates the problem of fault- and intrusion-tolerant consensus in resource-constrained wireless ad hoc networks. This is a fundamental problem in distributed computing because it abstracts the need to coordinate activities among various nodes. It has been shown to be a building block for several other important distributed computing problems like state-machine replication and atomic broadcast. The thesis begins by making a thorough performance assessment of existing intrusion-tolerant consensus protocols, which shows that the performance bottlenecks of current solutions are in part related to their system modeling assumptions. Based on these results, the communication failure model is identified as a model that simultaneously captures the reality of wireless ad hoc networks and allows the design of efficient protocols. Unfortunately, the model is subject to an impossibility result stating that there is no deterministic algorithm that allows n nodes to reach agreement if more than n2 omission transmission failures can occur in a communication step. This result is valid even under strict timing assumptions (i.e., a synchronous system). The thesis applies randomization techniques in increasingly weaker variants of this model, until an efficient intrusion-tolerant consensus protocol is achieved. The first variant simplifies the problem by restricting the number of nodes that may be at the source of a transmission failure at each communication step. An algorithm is designed that tolerates f dynamic nodes at the source of faulty transmissions in a system with a total of n 3f + 1 nodes. The second variant imposes no restrictions on the pattern of transmission failures. The proposed algorithm effectively circumvents the Santoro- Widmayer impossibility result for the first time. It allows k out of n nodes to decide despite dn 2 e(nk)+k2 omission failures per communication step. This algorithm also has the interesting property of guaranteeing safety during arbitrary periods of unrestricted message loss. The final variant shares the same properties of the previous one, but relaxes the model in the sense that the system is asynchronous and that a static subset of nodes may be malicious. The obtained algorithm, called Turquois, admits f < n 3 malicious nodes, and ensures progress in communication steps where dnf 2 e(n k f) + k 2. The algorithm is subject to a comparative performance evaluation against other intrusiontolerant protocols. The results show that, as the system scales, Turquois outperforms the other protocols by more than an order of magnitude.Esta tese investiga o problema do consenso tolerante a faltas acidentais e maliciosas em redes ad hoc sem fios. Trata-se de um problema fundamental que captura a essência da coordenação em actividades envolvendo vários nós de um sistema, sendo um bloco construtor de outros importantes problemas dos sistemas distribuídos como a replicação de máquina de estados ou a difusão atómica. A tese começa por efectuar uma avaliação de desempenho a protocolos tolerantes a intrusões já existentes na literatura. Os resultados mostram que as limitações de desempenho das soluções existentes estão em parte relacionadas com o seu modelo de sistema. Baseado nestes resultados, é identificado o modelo de falhas de comunicação como um modelo que simultaneamente permite capturar o ambiente das redes ad hoc sem fios e projectar protocolos eficientes. Todavia, o modelo é restrito por um resultado de impossibilidade que afirma não existir algoritmo algum que permita a n nós chegaram a acordo num sistema que admita mais do que n2 transmissões omissas num dado passo de comunicação. Este resultado é válido mesmo sob fortes hipóteses temporais (i.e., em sistemas síncronos) A tese aplica técnicas de aleatoriedade em variantes progressivamente mais fracas do modelo até ser alcançado um protocolo eficiente e tolerante a intrusões. A primeira variante do modelo, de forma a simplificar o problema, restringe o número de nós que estão na origem de transmissões faltosas. É apresentado um algoritmo que tolera f nós dinâmicos na origem de transmissões faltosas em sistemas com um total de n 3f + 1 nós. A segunda variante do modelo não impõe quaisquer restrições no padrão de transmissões faltosas. É apresentado um algoritmo que contorna efectivamente o resultado de impossibilidade Santoro-Widmayer pela primeira vez e que permite a k de n nós efectuarem progresso nos passos de comunicação em que o número de transmissões omissas seja dn 2 e(n k) + k 2. O algoritmo possui ainda a interessante propriedade de tolerar períodos arbitrários em que o número de transmissões omissas seja superior a . A última variante do modelo partilha das mesmas características da variante anterior, mas com pressupostos mais fracos sobre o sistema. Em particular, assume-se que o sistema é assíncrono e que um subconjunto estático dos nós pode ser malicioso. O algoritmo apresentado, denominado Turquois, admite f < n 3 nós maliciosos e assegura progresso nos passos de comunicação em que dnf 2 e(n k f) + k 2. O algoritmo é sujeito a uma análise de desempenho comparativa com outros protocolos na literatura. Os resultados demonstram que, à medida que o número de nós no sistema aumenta, o desempenho do protocolo Turquois ultrapassa os restantes em mais do que uma ordem de magnitude.FC

    Counter Attack on Byzantine Generals: Parameterized Model Checking of Fault-tolerant Distributed Algorithms

    Full text link
    We introduce an automated parameterized verification method for fault-tolerant distributed algorithms (FTDA). FTDAs are parameterized by both the number of processes and the assumed maximum number of Byzantine faulty processes. At the center of our technique is a parametric interval abstraction (PIA) where the interval boundaries are arithmetic expressions over parameters. Using PIA for both data abstraction and a new form of counter abstraction, we reduce the parameterized problem to finite-state model checking. We demonstrate the practical feasibility of our method by verifying several variants of the well-known distributed algorithm by Srikanth and Toueg. Our semi-decision procedures are complemented and motivated by an undecidability proof for FTDA verification which holds even in the absence of interprocess communication. To the best of our knowledge, this is the first paper to achieve parameterized automated verification of Byzantine FTDA

    Reliable broadcast protocols

    Get PDF
    A number of broadcast protocols that are reliable subject to a variety of ordering and delivery guarantees are considered. Developing applications that are distributed over a number of sites and/or must tolerate the failures of some of them becomes a considerably simpler task when such protocols are available for communication. Without such protocols the kinds of distributed applications that can reasonably be built will have a very limited scope. As the trend towards distribution and decentralization continues, it will not be surprising if reliable broadcast protocols have the same role in distributed operating systems of the future that message passing mechanisms have in the operating systems of today. On the other hand, the problems of engineering such a system remain large. For example, deciding which protocol is the most appropriate to use in a certain situation or how to balance the latency-communication-storage costs is not an easy question

    Replication and fault-tolerance in real-time systems

    Get PDF
    PhD ThesisThe increased availability of sophisticated computer hardware and the corresponding decrease in its cost has led to a widespread growth in the use of computer systems for realtime plant and process control applications. Such applications typically place very high demands upon computer control systems and the development of appropriate control software for these application areas can present a number of problems not normally encountered in other applications. First of all, real-time applications must be correct in the time domain as well as the value domain: returning results which are not only correct but also delivered on time. Further, since the potential for catastrophic failures can be high in a process or plant control environment, many real-time applications also have to meet high reliability requirements. These requirements will typically be met by means of a combination of fault avoidance and fault tolerance techniques. This thesis is intended to address some of the problems encountered in the provision of fault tolerance in real-time applications programs. Specifically,it considers the use of replication to ensure the availability of services in real-time systems. In a real-time environment, providing support for replicated services can introduce a number of problems. In particular, the scope for non-deterministic behaviour in real-time applications can be quite large and this can lead to difficultiesin maintainingconsistent internal states across the members of a replica group. To tackle this problem, a model is proposed for fault tolerant real-time objects which not only allows such objects to perform application specific recovery operations and real-time processing activities such as event handling, but which also allows objects to be replicated. The architectural support required for such replicated objects is also discussed and, to conclude, the run-time overheads associated with the use of such replicated services are considered.The Science and Engineering Research Council

    Terminology and paradigms for fault tolerance

    Get PDF

    Let It TEE: Asynchronous Byzantine Atomic Broadcast with n ≥ 2f + 1

    Get PDF
    Asynchronous Byzantine Atomic Broadcast (ABAB) promises, in comparison to partially synchronous approaches, simplicity in implementation, increased performance, and increased robustness. For partially synchronous approaches, it is well-known that small Trusted Execution Environments (TEE), e.g., MinBFT\u27s unique sequential identifier generator (USIG), are capable of reducing the communication effort while increasing the fault tolerance. For ABAB, the research community assumes that the use of TEEs increases performance and robustness. However, despite the existence of a fault-model compiler, a concrete TEE-based approach is not directly available yet. In this brief announcement, we show that the recently proposed DAG-Rider approach can be transformed to provide ABAB with n≥2f+1n\geq 2f+1 processes, of which ff are faulty. We leverage MinBFT\u27s USIG to implement Reliable Broadcast with n>fn>f processes and show that the quorum-critical proofs of DAG-Rider still hold when adapting the quorum size to ⌊n2⌋+1\lfloor \frac{n}{2} \rfloor + 1
    • …
    corecore