Search CORE

28 research outputs found

A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

Author: Treaster Michael
Publication venue
Publication date: 31/12/2004
Field of study

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Automated Course Advising System

Author: Gunadhi
Hagler
Hashemi
Laghari
Laghari
Laghari
Nambiar
Parrington
Pumpuang
Publication venue: 'IACSIT Press'
Publication date
Field of study

Crossref

Concurrency Control for Transactional Drago

Author: Arévalo Sergio
Jiménez-Peris Ricardo
Kienzle Jörg
Patiño-Martinez Marta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2005
Field of study

The granularity of concurrency control has a big impact on the performance of transactional systems. Concurrency control granu- larity and data granularity (data size) are usually the same. The e ect of this coupling is that if a coarse granularity is used, the overhead of data access (number of disk accesses) is reduced, but also the degree of concurrency. On the other hand, if a ne granularity is chosen to achieve a higher degree of concurrency (there are less con icts), the cost of data access is increased (each data item is accessed independently, which increases the number of disk accesses). There have been some pro- posals where data can be dynamically clustered/unclustered to increase either concurrency or data access depending on the application usage of data. However, concurrency control and data granularity remain tightly coupled. In Transactional Drago, a programming language for building distributed transactional applications, concurrency control has been un- coupled from data granularity, thus allowing to increase the degree of concurrency without degrading data access. This paper describes this approach and its implementation in Ada 95

Infoscience - École polytechnique fédérale de Lausanne

Exception Handling in Open Multithreaded Transactions

Author: Kienzle Jörg
Publication venue
Publication date: 20/09/2005
Field of study

This paper describes a model for providing transaction support for object-oriented concurrent programming languages. In order to achieve seamless integration, the use of the concurrency features provided by the programming language should not be restricted inside a transaction. A transaction model that meets this requirement is presented. Threads inside such a transaction may spawn new threads, but also external threads are allowed to join an ongoing transaction. A blocking commit protocol ensures that no thread leaves the transaction before its outcome has been determined. Exceptions are used to inform all participants in case a transaction aborts

Infoscience - École polytechnique fédérale de Lausanne

Combining Tasking and Transactions, Part II: Open Multithreaded Transactions

Author: Kienzle Jörg
Romanovsky Alexander
Publication venue: ACM Press
Publication date: 20/09/2005
Field of study

This position paper is a follow-up paper of the one presented at the last IRTAW workshop. The paper describes a model for providing transaction support for concurrent programming languages such as Ada 95. In order to achieve smooth integration, the use of the concurrency features provided by the Ada language should not be restricted inside a transaction. A transaction model that meets this requirement is presented. Tasks inside such a transaction may spawn new tasks, but also external tasks are allowed to join an ongoing transaction. A blocking commit protocol ensures that no task leaves the transaction before its outcome has been determined. Exceptions are used to inform all participants in case a transaction aborts. The design of a library that provides support for the transaction model is presented, and possible interfaces for the Ada programmer are discussed

Infoscience - École polytechnique fédérale de Lausanne

Open Multithreaded Transactions: Keeping Threads and Exceptions under Control

Author: Kienzle Jörg
Romanovsky Alexander
Strohmeier Alfred
Publication venue: IEEE Computer Society Press
Publication date: 20/09/2005
Field of study

Although transactional models have proved to be very useful for numerous applications, the development of new models to reflect the ever-increasing complexity and diversity of modern applications is a very active area of research. Analysis of the existing models of multithreaded transactions shows that they either give too much freedom to threads and do not control their participation in transactions, or unnecessarily restrict the computational model by assuming that only one thread can enter a transaction. Another important issue, which many models do not address properly, is providing adequate exception handling features. In this paper a new model of multithreaded transactions is proposed. Its detailed description is given, including rules of thread behaviour when transactions start, commit and abort, and rules of exception raising, propagation and handling. This model is supported by enhanced error detection techniques to allow for earlier error detection and for localised recovery. General approaches to implementing transaction support are discussed and a detailed description of an Ada implementation is given. Special attention is paid to outlining typical applications for which this model is suitable and to comparing it with several known approaches (Coordinated Atomic actions, CORBA, and Argus)

Infoscience - École polytechnique fédérale de Lausanne

Architectural Issues of JMS Compliant Group Communication

Author: Ekwall Richard
Kupsys Arnas
Publication venue
Publication date: 01/01/2005
Field of study

Group communication provides one-to-many communication primitives that simplify the development of highly available services. Despite advances in research and numerous prototypes, group communication stays confined to small niches. To facilitate the acceptance of group communication by a larger community, a new specification and API, called JMSGroups, based on the popular Java Message Service (JMS) has previously been presented. As a follow-up, this paper focuses on the architectural issues of the JMSGroups implementation. We consider an implementation based on a JMS server, i.e., a JMS server that is modified internally to provide a group communication service. Usually JMS server is implemented as a single entity providing its service to numerous clients. However, single server architecture is exposed to failures and is not suitable for group communication. To address this problem, we discuss the issues related to the JMS server replication (first without providing group communication). Different replicated architecture options are presented and compared. Finally, we show how to construct a fault-tolerant JMSGroups system, by extending the replicated JMS server with a group communication service

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Architectural Issues of JMS Compliant Group Communication

Author: Kupsys Arnas
Ekwall Richard
Publication venue
Publication date: 19/10/2005
Field of study

Infoscience - École polytechnique fédérale de Lausanne

FreiMusic - HfM Freiburg

Towards JMS-Compliant Group Communication

Author: Kupsys A.
Pleisch S.
Schiper A.
Wiesmann M.
Publication venue
Publication date: 13/07/2005
Field of study

Group communication provides communication primitives with various semantics and their use greatly simplifies the development of highly available services. However, despite tremendous advances in research and numerous prototypes, group communication stays confined to small niches and academic prototypes. In contrast, message-oriented middleware such as the Java Messaging Service (JMS) is widely used, and has become a de-facto standard. We believe that the lack of standard interfaces is the reason that hinders the deployment of group communication systems. Since JMS is well-established, an interesting solution is to map group communication primitives onto the JMS API. This requires to adapt the traditional specifications of group communication in order to take into account the features of JMS. The resulting group communication API, together with corresponding specifications, defines group communication primitives compatible with the JMS syntax and semantics

Infoscience - École polytechnique fédérale de Lausanne