43 research outputs found
Efficient Reliable Group Communication for Distributed Systems
Many applications can profit from broadcast communication, but few operating systems provide primitives that make broadcast communication available to user applications. In this paper we introduce primitives for broadcast communication that have been integrated with the Amoeba distributed operating system. The semantics of the broadcast primitives are simple, powerful, and easy to understand. Our primitives, for example, guarantee total ordering of broadcast messages. The proposed primitives are also efficient: if a network supports physical multicast, a reliable broadcast can be done in just slightly more than two messages on the average, so, the performance of a reliable broadcast is roughly comparable to that of a remote procedure call. In addition, the primitives are flexible: user applications can, for example, trade performance against fault tolerance. 1
Group Communication in Amoeba and its Applications
Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We wil
An Evaluation of the Amoeba Group Communication System
The Amoeba group communication system has two unique aspects: (1) it uses a sequencer-based protocol with negative acknowledgements for achieving a total order on all group messages; and (2) users choose the degree of fault tolerance they desire. This paper reports on our design decisions in retrospect, the performance of the Amoeba group system, and our experiences using the system. We conclude that sequencer-based group protocols achieve high performance (comparable to Amoeba's fast remote procedure call implementation), that the scalability of our sequencer-based protocols is limited by message processing time, and that the flexibility and modularity of user-level implementations of protocols is likely to outweigh the potential performance loss
R2PC: fault-tolerance made easy
Fault-tolerance is a concept that is becoming more and more important as computers are increasingly being used in application areas such as process control, air-traffic control and communication systems. However, the construction of fault-tolerant software remains a very difficult task, as it requires extensive knowledge and experience on the part of the designers of the system.
The basics of the Remote Procedure Call (RPC) protocol and its many variants are a fundamental mechanism that provides the adequate level of abstraction for the construction of distributed applications and release the programmers from the burden of dealing with low level networking protocols. However, the standard definition of the protocol does not provide us with semantics that are sufficiently transparent to deal with unexpected hardware and software faults, i.e. the programmer has to deal with possible problems that may occur. To deal with this problem, different reliable variations of the RPC protocol have been defined.
This dissertation introduces a new reliable protocol - R2PC - with the following characteristics.
⢠Symmetric treatment of client and server processes.
⢠Use of concurrently processed nested calls in stateful servers.
⢠The achievement of failure transparency at the application level
Group-oriented coordination models for distributed client-server computing
This paper describes group-oriented control models for distributed client-server interactions. These models transparently coordinate requests for services that involve multiple servers, such as queries across distributed databases. Specific capabilities include: decomposing and replicating client requests; dispatching request subtasks or copies to independent, networked servers; and combining server results into a single response for the client. The control models were implemented by combining request broker and process group technologies with an object-oriented communication middleware tool. The models are illustrated in the context of a distributed operations support application for space-based systems
Naming issues in the design of transparently distributed operating systems
PhD ThesisNaming is of fundamental importance in the design of transparently
distributed operating systems. A transparently distributed operating system
should be functionally equivalent to the systems of which it is composed. In
particular, the names of remote objects should be indistinguishable from the
names oflocal objects.
In this thesis we explore the implication that this recursive notion of
transparency has for the naming mechanisms provided by an operating system.
In particular, we show that a recursive naming system is more readily extensible
than a flat naming system by demonstrating that it is in precisely those areas in
which a system is not recursive that transparency is hardest to achieve. However,
this is not so much a problem of distribution so much as a problem of scale. A
system which does not scale well internally will not extend well to a distributed
system.
Building a distributed system out of existing systems involves joining the
name spaces of the individual systems together. When combining name spaces it
is important to preserve the identity of individual objects. Although unique
identifiers may be used to distinguish objects within a single name space, we
argue that it is difficult if not impossible in practice to guarantee the uniqueness
of such identifiers between name spaces. Instead, we explore the possibility of
Using hierarchical identifiers, unique only within a localised context. However,
We show that such identifiers cannot be used in an arbitrary naming graph
without compromising the notion of identity and hence violating the semantics of
the underlying system. The only alternative is to sacrifice a deterministic notion
of identity by using random identifiers to approximate global uniqueness with a
know probability of failure (which can be made arbitrarily small if the overall size
of the system is known in advance).UK Science and Engineering Research Council
Management of object-oriented action-based distributed programs
Phd ThesisThis thesis addresses the problem of managing the runtime behaviour of distributed
programs. The thesis of this work is that management is fundamentally
an information processing activity and that the object model, as applied to actionbased
distributed systems and database systems, is an appropriate representation
of the management information. In this approach, the basic concepts of classes,
objects, relationships, and atomic transition systems are used to form object
models of distributed programs. Distributed programs are collections of objects
whose methods are structured using atomic actions, i.e., atomic transactions.
Object models are formed of two submodels, each representing a fundamental
aspect of a distributed program. The structural submodel represents a static
perspective of the distributed program, and the control submodel represents a
dynamic perspective of it. Structural models represent the program's objects,
classes and their relationships. Control models represent the program's object
states, events, guards and actions-a transition system. Resolution of queries on
the distributed program's object model enable the management system to control
certain activities of distributed programs.
At a different level of abstraction, the distributed program can be seen as a
reactive system where two subprograms interact: an application program and a
management program; they interact only through sensors and actuators. Sensors
are methods used to probe an object's state and actuators are methods used
to change an object's state. The management program is capable to prod the
application program into action by activating sensors and actuators available at
the interface of the application program. Actions are determined by management
policies that are encoded in the management program. This way of structuring
the management system encourages a clear modularization of application and
management distributed programs, allowing better separation of concerns. Managemental
concerns can be dealt with by the management program, functional
concerns can be assigned to the application program.
The object-oriented action-based computational model adopted by the management
system provides a natural framework for the implementation of faulttolerant
distributed programs. Object orientation provides modularity and extensibility
through object encapsulation. Atomic actions guarantee the consistency of
the objects of the distributed program despite concurrency and failures. Replication
of the distributed program provides increased fault-tolerance by guaranteeing
the consistent progress of the computation, even though some of the replicated
objects can fail.
A prototype management system based on the management theory proposed
above has been implemented atop Arjuna; an object-oriented programming system
which provides a set of tools for constructing fault-tolerant distributed programs. The management system is composed of two subsystems: Stabilis, a
management system for structural information, and Vigil, a management system
for control information. Example applications have been implemented to illustrate
the use of the management system and gather experimental evidence to give
support to the thesis.CNPq (Consellho Nacional de Desenvolvimento Cientifico e Tecnol6gico, Brazil):
BROADCAST (Basic Research On Advanced Distributed Computing: from Algorithms to SysTems)
Management of concurrency in a reliable object-oriented computing system
PhD ThesisModern computing systems support concurrency as a means of increasing
the performance of the system. However, the potential for increased performance
is not without its problems. For example, lost updates and inconsistent retrieval
are but two of the possible consequences of unconstrained concurrency. Many
concurrency control techniques have been designed to combat these problems;
this thesis considers the applicability of some of these techniques in the context of
a reliable object-oriented system supporting atomic actions.
The object-oriented programming paradigm is one approach to handling the
inherent complexity of modern computer programs. By modeling entities from
the real world as objects which have well-defined interfaces, the interactions in
the system can be carefully controlled. By structuring sequences of such
interactions as atomic actions, then the consistency of the system is assured.
Objects are encapsulated entities such that their internal representation is not
externally visible. This thesis postulates that this encapsulation should also
include the capability for an object to be responsible for its own concurrency
control.
Given this latter assumption, this thesis explores the means by which the
property of type-inheritance possessed by object-oriented languages can be
exploited to allow programmers to explicitly control the level of concurrency an
object supports. In particular, a object-oriented concurrency controller based
upon the technique of two-phase locking is described and implemented using
type-inheritance. The thesis also shows how this inheritance-based approach is
highly flexible such that the basic concurrency control capabilities can be adopted
unchanged or overridden with more type-specific concurrency control if requiredUK Science and Engineering Research Council,
Serc/Alve