19 research outputs found

    Perpetual: Byzantine Fault Tolerance for Federated Distributed Applications

    Get PDF
    Modern distributed applications rely upon the functionality of services from multiple providers. Mission-critical services, possibly shared by multiple applications, must be replicated to guarantee correct execution and availability in spite of arbitrary (Byzantine) faults. Furthermore, shared services must enforce strict fault isolation policies to prevent cascading failures across organizational and application boundaries. Most existing protocols for Byzantine fault-tolerant execution do not support interoperability between replicated services while others provide poor fault isolation. Moreover, existing protocols place impractical limitations on application development by disallowing long-running threads of computation, asynchronous operation invocation, and asynchronous request processing. We present Perpetual, a protocol that facilitates unrestricted interoperability between replicated services while enforcing strict fault isolation criteria. Perpetual supports both asynchronous operation invocation and asynchronous request processing. Perpetual also supports long-running threads of computation, enabling Byzantine fault-tolerant execution of services that carry out active computations. We present performance evaluations demonstrating a moderate overhead due to replication

    Byzantine Fault-Tolerant Web Services for n-Tier and Service Oriented Architectures

    Get PDF
    Web Services that provide mission-critical functionality must be replicated to guarantee correct execution and high availability in spite of arbitrary (Byzantine) faults. Existing approaches for Byzantine fault-tolerant execution of Web Services are inadequate to guarantee correct execution due to several major limitations. Some approaches do not support interoperability between replicated Web Services. Other approaches do not provide fault isolation guarantees that are strong enough to prevent cascading failures across organizational and application boundaries. Moreover, existing approaches place impractical limitations on application development by not supporting long-running active threads of computation, fully asynchronous communication, and access to host specific information. We present Perpetual-WS, middleware that supports interaction between replicated Web Services while providing strict fault isolation guarantees. Perpetual-WS supports both synchronous and asynchronous message passing and enables an application model that supports long-running active threads of computation. We present an implementation based on Axis2 and performance evaluations demonstrating only a moderate decrease in throughput due to replication

    A holistic approach for measuring the survivability of SCADA systems

    Get PDF
    Supervisory Control and Data Acquisition (SCADA) systems are responsible for controlling and monitoring Industrial Control Systems (ICS) and Critical Infrastructure Systems (CIS) among others. Such systems are responsible to provide services our society relies on such as gas, electricity, and water distribution. They process our waste; manage our railways and our traffic. Nevertheless to say, they are vital for our society and any disruptions on such systems may produce from financial disasters to ultimately loss of lives. SCADA systems have evolved over the years, from standalone, proprietary solutions and closed networks into large-scale, highly distributed software systems operating over open networks such as the internet. In addition, the hardware and software utilised by SCADA systems is now, in most cases, based on COTS (Commercial Off-The-Shelf) solutions. As they evolved they became vulnerable to malicious attacks. Over the last few years there is a push from the computer security industry on adapting their security tools and techniques to address the security issues of SCADA systems. Such move is welcome however is not sufficient, otherwise successful malicious attacks on computer systems would be non-existent. We strongly believe that rather than trying to stop and detect every attack on SCADA systems it is imperative to focus on providing critical services in the presence of malicious attacks. Such motivation is similar with the concepts of survivability, a discipline integrates areas of computer science such as performance, security, fault-tolerance and reliability. In this thesis we present a new concept of survivability; Holistic survivability is an analysis framework suitable for a new era of data-driven networked systems. It extends the current view of survivability by incorporating service interdependencies as a key property and aspects of machine learning. The framework uses the formalism of probabilistic graphical models to quantify survivability and introduces new metrics and heuristics to learn and identify essential services automatically. Current definitions of survivability are often limited since they either apply performance as measurement metric or use security metrics without any survivability context. Holistic survivability addresses such issues by providing a flexible framework where performance and security metrics can be tailored to the context of survivability. In other words, by applying performance and security our work aims to support key survivability properties such as recognition and resistance. The models and metrics here introduced are applied to SCADA systems as such systems insecurity is one of the motivations of this work. We believe that the proposed work goes beyond the current status of survivability models. Holistic survivability is flexible enough to support the addition of other metrics and can be easily used with different models. Because it is based on a well-known formalism its definition and implementation are easy to grasp and to apply. Perhaps more importantly, this proposed work is aimed to a new era where data is being produced and consumed on a large-scale. Holistic survivability aims to be the catalyst to new models based on data that will provide better and more accurate insights on the survivability of systems

    LEGOS: Object-based software components for mission-critical systems. Final report, June 1, 1995--December 31, 1997

    Full text link

    How Practical Are Intrusion-Tolerant Distributed Systems?

    Get PDF
    Building secure, inviolable systems using traditional mechanisms is becoming increasingly an unattainable goal. The recognition of this fact has fostered the interest in alternative approaches to security such as intrusion tolerance, which applies fault tolerance concepts and techniques to security problems. Albeit this area is quite promising, intrusion-tolerant distributed systems typically rely on the assumption that the system components fail or are compromised independently. This is a strong assumption that has been repeatedly questioned. In this paper we discuss how this assumption can be implemented in practice using diversity of system components. We present a taxonomy of axes of diversity and discuss how they provide failure independence. Furthermore, we provide a practical example of an intrusion-tolerant system built using diversity

    Extending Byzantine Fault Tolerance to Replicated Clients

    Get PDF
    Byzantine agreement protocols for replicated deterministic state machines guarantee that externally requested operations continue to execute correctly even if a bounded number of replicas fail in arbitrary ways. The state machines are passive, with clients responsible for any active ongoing application behavior. However, the clients are unreplicated and outside the fault-tolerance boundary. Consequently, agreement protocols for replicated state machines do not guarantee continued correct execution of long-running client applications. Building on the Castro and Liskov Byzantine Fault Tolerance protocol for unreplicated clients (CLBFT), we present a practical algorithm for Byzantine fault-tolerant execution of long-running distributed applications in which replicated deterministic clients invoke operations on replicated deterministic servers. The algorithm scales well to large replica groups, with roughly double the latency and message count when compared to CLBFT, which supports only unreplicated clients. The algorithm supports both synchronous and asynchronous clients, provides fault isolation between client and server groups with respect to both correctness and performance, and uses a novel architecture that accommodates externally requested software upgrades for long-running evolvable client applications

    Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12

    Get PDF
    This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc
    corecore