527 research outputs found

    Proactive Service Migration for Long-Running Byzantine Fault-Tolerant Systems

    Get PDF
    A proactive recovery scheme based on service migration for long-running Byzantine fault-tolerant systems is described. Proactive recovery is an essential method for ensuring the long-term reliability of fault-tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window under normal operation. This is achieved in two ways. First, the time-consuming reboot step is removed from the critical path of proactive recovery. Second, the response time and the service migration latency are continuously profiled and an optimal service migration interval is dynamically determined during runtime based on the observed system load and the user-specified availability requirement

    Byzantine Fault Tolerance for Distributed Systems

    Get PDF
    The growing reliance on online services imposes a high dependability requirement on the computer systems that provide these services. Byzantine fault tolerance (BFT) is a promising technology to solidify such systems for the much needed high dependability. BFT employs redundant copies of the servers and ensures that a replicated system continues providing correct services despite the attacks on a small portion of the system. In this dissertation research, I developed novel algorithms and mechanisms to control various types of application nondeterminism and to ensure the long-term reliability of BFT systems via a migration-based proactive recovery scheme. I also investigated a new approach to significantly improve the overall system throughput by enabling concurrent processing using Software Transactional Memory (STM). Controlling application nondeterminism is essential to achieve strong replica consistency because the BFT technology is based on state-machine replication, which requires deterministic operation of each replica. Proactive recovery is necessary to ensure that the fundamental assumption of using the BFT technology is not violated over long term, i.e., less than one-third of replicas remain correct. Without proactive recovery, more and more replicas will be compromised under continuously attacks, which would render BFT ineffective. STM based concurrent processing maximized the system throughput by utilizing the power of multi-core CPUs while preserving strong replication consistenc

    Byzantine Fault Tolerance for Distributed Systems

    Get PDF
    The growing reliance on online services imposes a high dependability requirement on the computer systems that provide these services. Byzantine fault tolerance (BFT) is a promising technology to solidify such systems for the much needed high dependability. BFT employs redundant copies of the servers and ensures that a replicated system continues providing correct services despite the attacks on a small portion of the system. In this dissertation research, I developed novel algorithms and mechanisms to control various types of application nondeterminism and to ensure the long-term reliability of BFT systems via a migration-based proactive recovery scheme. I also investigated a new approach to significantly improve the overall system throughput by enabling concurrent processing using Software Transactional Memory (STM). Controlling application nondeterminism is essential to achieve strong replica consistency because the BFT technology is based on state-machine replication, which requires deterministic operation of each replica. Proactive recovery is necessary to ensure that the fundamental assumption of using the BFT technology is not violated over long term, i.e., less than one-third of replicas remain correct. Without proactive recovery, more and more replicas will be compromised under continuously attacks, which would render BFT ineffective. STM based concurrent processing maximized the system throughput by utilizing the power of multi-core CPUs while preserving strong replication consistenc

    Diverse Intrusion-tolerant Systems

    Get PDF
    Over the past 20 years, there have been indisputable advances on the development of Byzantine Fault-Tolerant (BFT) replicated systems. These systems keep operational safety as long as at most f out of n replicas fail simultaneously. Therefore, in order to maintain correctness it is assumed that replicas do not suffer from common mode failures, or in other words that replicas fail independently. In an adversarial setting, this requires that replicas do not include similar vulnerabilities, or otherwise a single exploit could be employed to compromise a significant part of the system. The thesis investigates how this assumption can be substantiated in practice by exploring diversity when managing the configurations of replicas. The thesis begins with an analysis of a large dataset of vulnerability information to get evidence that diversity can contribute to failure independence. In particular, we used the data from a vulnerability database to devise strategies for building groups of n replicas with different Operating Systems (OS). Our results demonstrate that it is possible to create dependable configurations of OSes, which do not share vulnerabilities over reasonable periods of time (i.e., a few years). Then, the thesis proposes a new design for a firewall-like service that protects and regulates the access to critical systems, and that could benefit from our diversity management approach. The solution provides fault and intrusion tolerance by implementing an architecture based on two filtering layers, enabling efficient removal of invalid messages at early stages in order to decrease the costs associated with BFT replication in the later stages. The thesis also presents a novel solution for managing diverse replicas. It collects and processes data from several data sources to continuously compute a risk metric. Once the risk increases, the solution replaces a potentially vulnerable replica by another one, trying to maximize the failure independence of the replicated service. Then, the replaced replica is put on quarantine and updated with the available patches, to be prepared for later re-use. We devised various experiments that show the dependability gains and performance impact of our prototype, including key benchmarks and three BFT applications (a key-value store, our firewall-like service, and a blockchain).Unidade de investigação LASIGE (UID/CEC/00408/2019) e o projeto PTDC/EEI-SCR/1741/2041 (Abyss

    Enhancing intrusion resilience in publicly accessible distributed systems

    Get PDF
    PhD ThesisThe internet is increasingly used as a means of communication by many businesses. Online shopping has become an important commercial activity and many governmental bodies offer services online. Malicious intrusion into these systems can have major negative consequences, both for the providers and users of these services. The need to protect against malicious intrusion, coupled with the difficulty of identifying and removing all possible vulnerabilities in a distributed system, have led to the use of systems that can tolerate intrusions with no loss of integrity. These systems require that services be replicated as deterministic state machines, a relatively hard task in practice, and do not ensure that confidentiality is maintained when one or more replicas are successfully intruded into. This thesis presents FORTRESS, a novel intrusion-resilient system that makes use of proactive obfuscation techniques and cheap off-the-shelf hardware to enhance intrusionresilience. FORTRESS uses proxies to prevent clients accessing servers directly, and regular replacement of proxies and servers with differently obfuscated versions. This maintains both confidentiality and integrity as long as an attacker does not compromise the system as a whole. The expected lifetime until system compromise of the FORTRESS system is compared to those of state machine replicated and primary backup systems when confronted with an attacker capable of launching distributed attacks against known vulnerabilities. Thus, FORTRESS is demonstrated to be a viable alternative to building intrusion-tolerant systems using deterministic state machine replication. The performance overhead of the FORTRESS system is also evaluated, using both a general state transfer framework for distributed systems, and a lightweight framework for large scale web applications. This shows the FORTRESS system has a sufficiently small performance overhead to be of practical use

    Stimulating proactive fault detection in distributed systems

    Get PDF
    Fault Tolerance is an important issue considered when developing a reliable Distributed System. Reactive fault systems are designed to redistribute the current process on to other machines when failure occurs. In contrast to the conventional method of reactive recovery, an emerging concept in the field of fault tolerance is a proactive approach. This approach exploits pre fault symptoms and initiates fault recovery henceforth. This project is to implement a proactive fault prediction simulator for a distributed system. This will include developing a language for simulation, which allows the user to define a distributed system. The language is further used to develop an environment that integrates two fault prediction algorithms, Wilcoxon s Rank-Sum and DFT (Dispersion Frame Technique). Both these algorithms are presented as alternatives for SMART (Self Monitoring Analysis and Reporting Technology) in this project. The project also includes a comparison metrics for these prediction algorithms in terms of prediction precision and prediction accuracy
    corecore