3 research outputs found

    Efficient Reliable Group Communication for Distributed Systems

    Get PDF
    Many applications can profit from broadcast communication, but few operating systems provide primitives that make broadcast communication available to user applications. In this paper we introduce primitives for broadcast communication that have been integrated with the Amoeba distributed operating system. The semantics of the broadcast primitives are simple, powerful, and easy to understand. Our primitives, for example, guarantee total ordering of broadcast messages. The proposed primitives are also efficient: if a network supports physical multicast, a reliable broadcast can be done in just slightly more than two messages on the average, so, the performance of a reliable broadcast is roughly comparable to that of a remote procedure call. In addition, the primitives are flexible: user applications can, for example, trade performance against fault tolerance. 1

    Object replication in a distributed system

    Get PDF
    PhD ThesisA number of techniques have been proposed for the construction of fault—tolerant applications. One of these techniques is to replicate vital system resources so that if one copy fails sufficient copies may still remain operational to allow the application to continue to function. Interactions with replicated resources are inherently more complex than non—replicated interactions, and hence some form of replication transparency is necessary. This may be achieved by employing replica consistency protocols to mask replica failures and maintain consistency of state between functioning replicas. To achieve consistency between replicas it is necessary to ensure that all replicas receive the same set of messages in the same order, despite failures at the senders and receivers. This can be accomplished by making use of order preserving reliable communication protocols. However, we shall show how it can be more efficient to use unordered reliable communication and to impose ordering at the application level, by making use of syntactic knowledge of the application. This thesis develops techniques for replicating objects: in general this is harder than replicating data, as objects (which can contain data) can contain calls on other objects. Handling replicated objects is essentially the same as handling replicated computations, and presents more problems than simply replicating data. We shall use the concept of the object to provide transparent replication to users: a user will interact with only a single object interface which hides the fact that the object is actually replicated. The main aspects of the replication scheme presented in this thesis have been fully implemented and tested. This includes the design and implementation of a replicated object invocation protocol and the algorithms which ensure that (replicated) atomic actions can manipulate replicated objects.Research Studentship, Science and Engineering Research Council. Esprit Project 2267 (Integrated Systems Architecture)

    System support for object replication in distributed systems

    Get PDF
    Distributed systems are composed of a collection of cooperating but failure prone system components. The number of components in such systems is often large and, despite low probabilities of any particular component failing, the likelihood that there will be at least a small number of failures within the system at a given time is high. Therefore, distributed systems must be able to withstand partial failures. By being resilient to partial failures, a distributed system becomes more able to offer a dependable service and therefore more useful. Replication is a well known technique used to mask partial failures and increase reliability in distributed computer systems. However, replication management requires sophisticated distributed control algorithms, and is therefore a labour intensive and error prone task. Furthermore, replication is in most cases employed due to applications' non-functional requirements for reliability, as dependability is generally an orthogonal issue to the problem domain of the application. If system level support for replication is provided, the application developer can devote more effort to application specific issues. Distributed systems are inherently more complex than centralised systems. Encapsulation and abstraction of components and services can be of paramount importance in managing their complexity. The use of object oriented techniques and languages, providing support for encapsulation and abstraction, has made development of distributed systems more manageable. In systems where applications are being developed using object-oriented techniques, system support mechanisms must recognise this, and provide support for the object-oriented approach. The architecture presented exploits object-oriented techniques to improve transparency and to reduce the application programmer involvement required to use the replication mechanisms. This dissertation describes an approach to implementing system support for object replication, which is distinct from other approaches such as replicated objects in that objects are not specially designed for replication. Additionally, object replication, in contrast to data replication, is a function-shipping approach and deals with the replication of both operations and data. Object replication is complicated by objects' encapsulation of local state and the arbitrary interaction patterns that may exist among objects. Although fully transparent object replication has not been achieved, my thesis is that partial system support for replication of program-level objects is practicable and assists the development of certain classes of reliable distributed applications. I demonstrate the usefulness of this approach by describing a prototype implementation and showing how it supports the development of an example toy application. To increase their flexibility, the system support mechanisms described are tailorable. The approach adopted in this work is to provide partial support for object replication, relying on some assistance from the application developer to supply application dependent functionality within particular collators for dealing with processing of results from object replicas. Care is taken to make the programming model as simple and concise as possible
    corecore