1,407 research outputs found

    Perpetual: Byzantine Fault Tolerance for Federated Distributed Applications

    Get PDF
    Modern distributed applications rely upon the functionality of services from multiple providers. Mission-critical services, possibly shared by multiple applications, must be replicated to guarantee correct execution and availability in spite of arbitrary (Byzantine) faults. Furthermore, shared services must enforce strict fault isolation policies to prevent cascading failures across organizational and application boundaries. Most existing protocols for Byzantine fault-tolerant execution do not support interoperability between replicated services while others provide poor fault isolation. Moreover, existing protocols place impractical limitations on application development by disallowing long-running threads of computation, asynchronous operation invocation, and asynchronous request processing. We present Perpetual, a protocol that facilitates unrestricted interoperability between replicated services while enforcing strict fault isolation criteria. Perpetual supports both asynchronous operation invocation and asynchronous request processing. Perpetual also supports long-running threads of computation, enabling Byzantine fault-tolerant execution of services that carry out active computations. We present performance evaluations demonstrating a moderate overhead due to replication

    Fault tolerant software technology for distributed computing system

    Get PDF
    Issued as Monthly reports [nos. 1-23], Interim technical report, Technical guide books [nos. 1-2], and Final report, Project no. G-36-64

    Object replication in a distributed system

    Get PDF
    PhD ThesisA number of techniques have been proposed for the construction of fault—tolerant applications. One of these techniques is to replicate vital system resources so that if one copy fails sufficient copies may still remain operational to allow the application to continue to function. Interactions with replicated resources are inherently more complex than non—replicated interactions, and hence some form of replication transparency is necessary. This may be achieved by employing replica consistency protocols to mask replica failures and maintain consistency of state between functioning replicas. To achieve consistency between replicas it is necessary to ensure that all replicas receive the same set of messages in the same order, despite failures at the senders and receivers. This can be accomplished by making use of order preserving reliable communication protocols. However, we shall show how it can be more efficient to use unordered reliable communication and to impose ordering at the application level, by making use of syntactic knowledge of the application. This thesis develops techniques for replicating objects: in general this is harder than replicating data, as objects (which can contain data) can contain calls on other objects. Handling replicated objects is essentially the same as handling replicated computations, and presents more problems than simply replicating data. We shall use the concept of the object to provide transparent replication to users: a user will interact with only a single object interface which hides the fact that the object is actually replicated. The main aspects of the replication scheme presented in this thesis have been fully implemented and tested. This includes the design and implementation of a replicated object invocation protocol and the algorithms which ensure that (replicated) atomic actions can manipulate replicated objects.Research Studentship, Science and Engineering Research Council. Esprit Project 2267 (Integrated Systems Architecture)

    Reaching High Availability in Connected Car Backend Applications

    Get PDF
    The connected car segment has high demands on the exchange of data between the car on the road, and a variety of services in the backend. By the end of 2020, connected services will be mainstream automotive offerings, according to TelefĂłnica - Connected Car Industry Report 2014 the overall number of vehicles with built-in internet connectivity will increase from 10% of the overall market today to 90% by the end of the decade [1]. Connected car solutions will soon become one of the major business drivers for the industry; they already have a significant impact on existing solutions development and aftersales market. It has been more than three decades since the introduction of the first software component in cars, and since then a vast amount of different services has been introduced, creating an ecosystem of complex applications, architectures, and platforms. The complexity of the connected car ecosystem results into a range of new challenges. The backend applications must be scalable and flexible enough to accommodate loads created by the random user and device behavior. To deliver superior uptime, back-end systems must be highly integrated and automated to guarantee lowest possible failure rate, high availability, and fastest time-to-market. Connected car services increasingly rely on cloud-based service delivery models for improving user experiences and enhancing features for millions of vehicles and their users on a daily basis. Nowadays, the software applications become more complex, and the number of components that are involved and interact with each other is extremely large. In such systems, if a fault occurs, it can easily propagate and can affect other components resulting in a complex problem which is difficult to detect and debugg, therefore a robust and resilient architecture is needed which ensures the continuous availability of system in the wake of component failures, making the overall system highly available. The goal of the thesis is to gain insight into the development of highly available applications and to explore the area of fault tolerance. This thesis outlines different design patterns and describes the capabilities of fault tolerance libraries for Java platform, and design the most appropriate solution for developing a highly available application and evaluate the behavior with stress and load testing using Chaos Monkey methodologies

    Replicated Computations in a Distributed Switching Environment

    Get PDF
    Replication of computations in a distributed switching environment is studied. The first topics discussed are the requirements and the other design goals that have to be met by replicated computations in a distributed switching system. The requirements on the grade of service and availability performance objectives are largely set out in the international standards. A structured probability oriented software approach to building a kernel supporting replicated computations is suggested and the functional as well as the probability properties of the replication scheme are investigated. To aid the definition and investigation of the functional properties of the replication scheme a model of computation based on the actor model of Hewitt and Agha is defined and used. The overall replication scheme consists of a loose basic scheme, the real-time computation migration tools, here designated as warm-up algorithms, and the corrective replication tools augmenting the basic scheme. Language methods which enhance the transparency of the replication scheme are also discussed. The work has been done in connection with a redesign project of a distributed digital switching system and the results have largely been implemented in that environment
    • …
    corecore