28 research outputs found

    A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

    Full text link
    Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

    Concurrency Control for Transactional Drago

    Get PDF
    The granularity of concurrency control has a big impact on the performance of transactional systems. Concurrency control granu- larity and data granularity (data size) are usually the same. The e ect of this coupling is that if a coarse granularity is used, the overhead of data access (number of disk accesses) is reduced, but also the degree of concurrency. On the other hand, if a ne granularity is chosen to achieve a higher degree of concurrency (there are less con icts), the cost of data access is increased (each data item is accessed independently, which increases the number of disk accesses). There have been some pro- posals where data can be dynamically clustered/unclustered to increase either concurrency or data access depending on the application usage of data. However, concurrency control and data granularity remain tightly coupled. In Transactional Drago, a programming language for building distributed transactional applications, concurrency control has been un- coupled from data granularity, thus allowing to increase the degree of concurrency without degrading data access. This paper describes this approach and its implementation in Ada 95

    Exception Handling in Open Multithreaded Transactions

    Get PDF
    This paper describes a model for providing transaction support for object-oriented concurrent programming languages. In order to achieve seamless integration, the use of the concurrency features provided by the programming language should not be restricted inside a transaction. A transaction model that meets this requirement is presented. Threads inside such a transaction may spawn new threads, but also external threads are allowed to join an ongoing transaction. A blocking commit protocol ensures that no thread leaves the transaction before its outcome has been determined. Exceptions are used to inform all participants in case a transaction aborts

    Combining Tasking and Transactions, Part II: Open Multithreaded Transactions

    Get PDF
    This position paper is a follow-up paper of the one presented at the last IRTAW workshop. The paper describes a model for providing transaction support for concurrent programming languages such as Ada 95. In order to achieve smooth integration, the use of the concurrency features provided by the Ada language should not be restricted inside a transaction. A transaction model that meets this requirement is presented. Tasks inside such a transaction may spawn new tasks, but also external tasks are allowed to join an ongoing transaction. A blocking commit protocol ensures that no task leaves the transaction before its outcome has been determined. Exceptions are used to inform all participants in case a transaction aborts. The design of a library that provides support for the transaction model is presented, and possible interfaces for the Ada programmer are discussed

    Open Multithreaded Transactions: Keeping Threads and Exceptions under Control

    Get PDF
    Although transactional models have proved to be very useful for numerous applications, the development of new models to reflect the ever-increasing complexity and diversity of modern applications is a very active area of research. Analysis of the existing models of multithreaded transactions shows that they either give too much freedom to threads and do not control their participation in transactions, or unnecessarily restrict the computational model by assuming that only one thread can enter a transaction. Another important issue, which many models do not address properly, is providing adequate exception handling features. In this paper a new model of multithreaded transactions is proposed. Its detailed description is given, including rules of thread behaviour when transactions start, commit and abort, and rules of exception raising, propagation and handling. This model is supported by enhanced error detection techniques to allow for earlier error detection and for localised recovery. General approaches to implementing transaction support are discussed and a detailed description of an Ada implementation is given. Special attention is paid to outlining typical applications for which this model is suitable and to comparing it with several known approaches (Coordinated Atomic actions, CORBA, and Argus)

    Architectural Issues of JMS Compliant Group Communication

    Get PDF
    Group communication provides one-to-many communication primitives that simplify the development of highly available services. Despite advances in research and numerous prototypes, group communication stays confined to small niches. To facilitate the acceptance of group communication by a larger community, a new specification and API, called JMSGroups, based on the popular Java Message Service (JMS) has previously been presented. As a follow-up, this paper focuses on the architectural issues of the JMSGroups implementation. We consider an implementation based on a JMS server, i.e., a JMS server that is modified internally to provide a group communication service. Usually JMS server is implemented as a single entity providing its service to numerous clients. However, single server architecture is exposed to failures and is not suitable for group communication. To address this problem, we discuss the issues related to the JMS server replication (first without providing group communication). Different replicated architecture options are presented and compared. Finally, we show how to construct a fault-tolerant JMSGroups system, by extending the replicated JMS server with a group communication service

    Towards JMS-Compliant Group Communication

    Get PDF
    Group communication provides communication primitives with various semantics and their use greatly simplifies the development of highly available services. However, despite tremendous advances in research and numerous prototypes, group communication stays confined to small niches and academic prototypes. In contrast, message-oriented middleware such as the Java Messaging Service (JMS) is widely used, and has become a de-facto standard. We believe that the lack of standard interfaces is the reason that hinders the deployment of group communication systems. Since JMS is well-established, an interesting solution is to map group communication primitives onto the JMS API. This requires to adapt the traditional specifications of group communication in order to take into account the features of JMS. The resulting group communication API, together with corresponding specifications, defines group communication primitives compatible with the JMS syntax and semantics
    corecore