33 research outputs found
Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing
Abstract—Large applications executing on Grid or cluster architectures consisting of hundreds or thousands of computational nodes create problems with respect to reliability. The source of the problems are node failures and the need for dynamic configuration over extensive runtime. This paper presents two fault-tolerance mechanisms called Theft-Induced Checkpointing and Systematic Event Logging. These are transparent protocols capable of overcoming problems associated with both benign faults, i.e., crash faults, and node or subnet volatility. Specifically, the protocols base the state of the execution on a dataflow graph, allowing for efficient recovery in dynamic heterogeneous systems as well as multithreaded applications. By allowing recovery even under different numbers of processors, the approaches are especially suitable for applications with a need for adaptive or reactionary configuration control. The low-cost protocols offer the capability of controlling or bounding the overhead. A formal cost model is presented, followed by an experimental evaluation. It is shown that the overhead of the protocol is very small, and the maximum work lost by a crashed process is small and bounded. Index Terms—Grid computing, rollback recovery, checkpointing, event logging. Ç
Design and Implementation of a Byzantine Fault Tolerance Framework for Web Services
Many Web services are expected to run with high degree of security and dependability. To achieve this goal, it is essential to use a Web services compatible framework that tolerates not only crash faults, but Byzantine faults as well, due to the untrusted communication environment in which the Web services operate. In this paper, we describe the design and implementation of such a framework, called BFT-WS. BFT-WS is designed to operate on top of the standard SOAP messaging framework for maximum interoperability. It is implemented as a pluggable module within the Axis2 architecture, as such, it requires minimum changes to the Web applications. The core fault tolerance mechanisms used in BFT-WS are based on the well-known Castro and Liskov’s BFT algorithm for optimal efficiency. Our performance measurements confirm that BFT-WS incurs only moderate runtime overhead considering the complexity of the mechanisms
The Bedrock of Byzantine Fault Tolerance: A Unified Platform for BFT Protocol Design and Implementation
Byzantine Fault-Tolerant (BFT) protocols have recently been extensively used
by decentralized data management systems with non-trustworthy infrastructures,
e.g., permissioned blockchains. BFT protocols cover a broad spectrum of design
dimensions from infrastructure settings such as the communication topology, to
more technical features such as commitment strategy and even fundamental social
choice properties like order-fairness. The proliferation of different BFT
protocols has rendered it difficult to navigate the BFT landscape, let alone
determine the protocol that best meets application needs. This paper presents
Bedrock, a unified platform for BFT protocols design, analysis, implementation,
and experiments. Bedrock proposes a design space consisting of a set of design
choices capturing the trade-offs between different design space dimensions and
providing fundamentally new insights into the strengths and weaknesses of BFT
protocols. Bedrock enables users to analyze and experiment with BFT protocols
within the space of plausible choices, evolve current protocols to design new
ones, and even uncover previously unknown protocols. Our experimental results
demonstrate the capability of Bedrock to uniformly evaluate BFT protocols in
new ways that were not possible before due to the diverse assumptions made by
these protocols. The results validate Bedrock's ability to analyze and derive
BFT protocols
Byzantine Fault Tolerance for Nondeterministic Applications
The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player
Byzantine Fault Tolerance for Nondeterministic Applications
The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player
Byzantine Fault Tolerance for Nondeterministic Applications
The growing reliance on online services accessible on the Internet demands highly reliable system that would not be interrupted when encountering faults. A number of Byzantine fault tolerance (BFT) algorithms have been developed to mask the most complicated type of faults - Byzantine faults such as software bugs,operator mistakes, and malicious attacks, which are usually the major cause of service interruptions. However, it is often difficult to apply these algorithms to practical applications because such applications often exhibit sophisticated non-deterministic behaviors that the existing BFT algorithms could not cope with. In this thesis, we propose a classification of common types of replica nondeterminism with respect to the requirement of achieving Byzantine fault tolerance, and describe the design and implementation of the core mechanisms necessary to handle such replica nondeterminism within a Byzantine fault tolerance framework. In addition, we evaluated the performance of our BFT library, referred to as ND-BFT using both a micro-benchmark application and a more realistic online porker game application. The performance results show that the replicated online poker game performs approximately 13 slower than its nonreplicated counterpart in the presence of small number of player
Virtual network security: threats, countermeasures, and challenges
Network virtualization has become increasingly prominent in recent years. It enables the creation of network infrastructures that are specifically tailored to the needs of distinct network applications and supports the instantiation of favorable environments for the development and evaluation of new architectures and protocols. Despite the wide applicability of network virtualization, the shared use of routing devices and communication channels leads to a series of security-related concerns. It is necessary to provide protection to virtual network infrastructures in order to enable their use in real, large scale environments. In this paper, we present an overview of the state of the art concerning virtual network security. We discuss the main challenges related to this kind of environment, some of the major threats, as well as solutions proposed in the literature that aim to deal with different security aspects.Network virtualization has become increasingly prominent in recent years. It enables the creation of network infrastructures that are specifically tailored to the needs of distinct network applications and supports the instantiation of favorable environme61CNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICORNP - REDE NACIONAL DE ENSINO E PESQUISAFAPERGS - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DO RIO GRANDE DO SULsem informaçãosem informaçãosem informaçã