Search CORE

348 research outputs found

Replicated Invocation

Author: Kupsys Arnas
Pleisch Stefan
Schiper André
Publication venue
Publication date: 13/07/2005
Field of study

In today's systems, application are composed from various components that may be located on different machines. The components may have to collaborate in order to service a client request. More specifically, a client request to one component may trigger a request to another component. Moreover, to ensure fault-tolerance, components are generally replicated. This poses the problem of a replicated server invoking another replicated server. We call it the problem of replicated invocation. Replicated invocation has been considered in the context of deterministic servers. However, the problem is more difficult to address when servers are non-deterministic. In this context, work has been done to enforce deterministic execution. In the paper we consider a different approach. Instead of preventing non-deterministic execution of servers, we discuss how to handle it. The paper first discusses the problem of non-deterministic replicated invocation. Then the paper proposes a different solution to solve these problems

Infoscience - École polytechnique fédérale de Lausanne

JMSGroups:JMS compliant group communication

Author: Kupšys Arnas
Publication venue: Lausanne, EPFL
Publication date: 31/08/2005
Field of study

Nowadays, computers are the indispensable part of our life. They evolve rapidly and are more and more versatile. Computer networks made the remote corners of the world just a click away. But unavoidably, any software and hardware component is subject to failure. Distributed systems spread on tens or hundreds of machines are particularly vulnerable to failures. Consequently, high availability and fault tolerance became a "must have" feature for such systems. Software fault tolerance is achieved through the technique called replication. In replication several software replicas are executed at the same time. If one or several of them fail, other still provide the service. Software replication is often implemented using group communication, which provides communication primitives with various semantics and greatly simplifies the development of highly available and fault tolerant services. However, despite tremendous advances in research and numerous prototypes, group communication stays confined to small niches and academic prototypes. In contrast, other technology, called messageoriented middleware such as the Java Message Service (JMS) is widely used in distributed systems, and has become a de-facto standard. We believe that the lack of a well-defined and easily understandable standard is the reason that hinders the deployment of group communication systems. Since JMS is a well-established technology, we propose to extend JMS adding group communication primitives to it. Foremost, this requires to extend the traditional semantics of group communication in order to take into account various features of JMS, e.g., durable/non-durable subscriptions and persistent/non-persistent messages. The resulting new group communication specification, together with the corresponding API, defines group communication primitives compatible with JMS, that we call JMSGroups. To validate the specification and API we provide a prototype implementation of JMSGroups. As such, we believe it facilitates the acceptance of group communication by a larger community and provides a powerful environment for building fault-tolerant applications

Infoscience - École polytechnique fédérale de Lausanne

CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

Author: Arad Cosmin
Haridi Seif
Shafaat Tallat M.
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2012
Field of study

Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Model-driven Fault-Tolerance Provisioning for Component-based Distributed Real-time Embedded Systems

Author: Tambe Sumant
Publication venue: VANDERBILT
Publication date
Field of study

Maintaining consistency in distributed systems

Author: Birman Kenneth P.
Publication venue
Publication date: 01/01/1991
Field of study

In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability

CiteSeerX

NASA Technical Reports Server

eCommons@Cornell

Reliability issues in the design of distributed object-based architectures

Author: Mancini Luigi Vincenzo
Publication venue: Newcastle University
Publication date: 01/01/1989
Field of study

PhD ThesisThis thesis is aimed at enhancing the existing set of techniques for building distributed systems, specifically from the point of view of fault-tolerant com- puting. Reliability is of fundamental importance in the design and operation of dis- tributed systems, as an increasing number of computers are employed in the automation of various essential services. In the past decade, much research effort has been concerned with the object-based methodology for the design and implementation of reliable distributed systems. This thesis describes three contributions to this effort. First, it is shown that object-based programming features can in fact be introduced into pro- cedural languages provided that these languages are endowed with certain facilities. Then, work is discussed which illustrates the relationship between distributed object-based architectures and an apparently different form of distributed architectures based on processes. This work puts the notion of object-based architectures into a new perspective, which shows that the object-based philosophy and the process-based philosophy are the dual of each other. Finally, an important aspect of the design of an object-based distributed architecture is investigated, that of automatic garbage collection. A distri- buted garbage collection scheme is described that handles fault tolerance by an extension of the technique commonly employed to detect unwanted com- putations in distributed architectures. The scheme proposed can also be seen as yet a further illustration of the link between object-based and process-based architectures.Royal Signals and Radar Establishment of the U.K. Ministry of Defence. Italian Consiglio Nazionale delle Ricerch

Newcastle University eTheses

System support for object replication in distributed systems

Author: Faegri Tor Erlend
Publication venue
Publication date: 01/01/1998
Field of study

Distributed systems are composed of a collection of cooperating but failure prone system components. The number of components in such systems is often large and, despite low probabilities of any particular component failing, the likelihood that there will be at least a small number of failures within the system at a given time is high. Therefore, distributed systems must be able to withstand partial failures. By being resilient to partial failures, a distributed system becomes more able to offer a dependable service and therefore more useful. Replication is a well known technique used to mask partial failures and increase reliability in distributed computer systems. However, replication management requires sophisticated distributed control algorithms, and is therefore a labour intensive and error prone task. Furthermore, replication is in most cases employed due to applications' non-functional requirements for reliability, as dependability is generally an orthogonal issue to the problem domain of the application. If system level support for replication is provided, the application developer can devote more effort to application specific issues. Distributed systems are inherently more complex than centralised systems. Encapsulation and abstraction of components and services can be of paramount importance in managing their complexity. The use of object oriented techniques and languages, providing support for encapsulation and abstraction, has made development of distributed systems more manageable. In systems where applications are being developed using object-oriented techniques, system support mechanisms must recognise this, and provide support for the object-oriented approach. The architecture presented exploits object-oriented techniques to improve transparency and to reduce the application programmer involvement required to use the replication mechanisms. This dissertation describes an approach to implementing system support for object replication, which is distinct from other approaches such as replicated objects in that objects are not specially designed for replication. Additionally, object replication, in contrast to data replication, is a function-shipping approach and deals with the replication of both operations and data. Object replication is complicated by objects' encapsulation of local state and the arbitrary interaction patterns that may exist among objects. Although fully transparent object replication has not been achieved, my thesis is that partial system support for replication of program-level objects is practicable and assists the development of certain classes of reliable distributed applications. I demonstrate the usefulness of this approach by describing a prototype implementation and showing how it supports the development of an example toy application. To increase their flexibility, the system support mechanisms described are tailorable. The approach adopted in this work is to provide partial support for object replication, relying on some assistance from the application developer to supply application dependent functionality within particular collators for dealing with processing of results from object replicas. Care is taken to make the programming model as simple and concise as possible

Glasgow Theses Service

ATOMAS : a transaction-oriented open multi agent system; final report

Author: Baumann Joachim
Hohl Fritz
Rothermel Kurt
Schwehm Markus
Straßer Markus
Publication venue
Publication date: 19/06/2013
Field of study

The electronic marketplace of the future will consist of a large number of services located on an open, distributed and heterogeneous platform, which will be used by an even larger number of clients. Mobile Agent Systems are considered to be a precondition for the evolution of such an electronic market. They can provide a flexible infrastructure for this market, i.e. for the installation of new services by service agents as well as for the utilization of these services by client agents. Mobile Agent Systems basically consist of a number of locations and agents. Locations are (logical) abstractions for (physical) hosts in a computer network. The network of locations serves as a unique and homogeneous platform, while the underlying network of hosts may be heterogeneous and widely distributed. Locations therefore have to guarantee independence from the underlying hard- and software. To make the Mobile Agent System an open platform, the system furthermore has to guarantee security of hosts against malicious attacks