324 research outputs found

    A Byzantine Fault Tolerant Distributed Commit Protocol

    Full text link
    In this paper, we present a Byzantine fault tolerant distributed commit protocol for transactions running over untrusted networks. The traditional two-phase commit protocol is enhanced by replicating the coordinator and by running a Byzantine agreement algorithm among the coordinator replicas. Our protocol can tolerate Byzantine faults at the coordinator replicas and a subset of malicious faults at the participants. A decision certificate, which includes a set of registration records and a set of votes from participants, is used to facilitate the coordinator replicas to reach a Byzantine agreement on the outcome of each transaction. The certificate also limits the ways a faulty replica can use towards non-atomic termination of transactions, or semantically incorrect transaction outcomes.Comment: To appear in the proceedings of the 3rd IEEE International Symposium on Dependable, Autonomic and Secure Computing, 200

    Byzantine Fault Tolerant Coordination for Web Services Atomic Transactions

    Get PDF
    This thesis describes a Byzantine fault tolerant coordination framework for Web services atomic transactions. In the framework, all core services, including transaction activation, registration, completion, and distributed commit, are replicated and protected by Byzantine fault tolerance mechanisms. The traditional two-phase commit protocol is extended by a Byzantine fault tolerant version that can tolerate arbitrary faults on the coordinator and the initiator sides, and some types of malicious faults on the participant side. To achieve Byzantine fault tolerance in an efficient manner, and to limit the types of malicious behaviors of the coordinator, a novel decision certificate is introduced. The decision certificate includes a signed copy of the participants\u27 vote records, and it is piggybacked with all decision notifications to the participants for each participant to verify the legitimacy of the decision. The Byzantine fault tolerance mechanisms, together with the extended two-phase commit protocol, have been incorporated into an open-source framework supporting the standard Web services atomic transactions specification. Performance characterizations of the framework show that the implementation is fairly efficient. Such a Byzantine fault tolerant coordination framework can be useful for many transactional Web services that require a high degree of security and dependabilit

    Byzantine Fault Tolerant Coordination for Web Services Atomic Transactions

    Get PDF
    This thesis describes a Byzantine fault tolerant coordination framework for Web services atomic transactions. In the framework, all core services, including transaction activation, registration, completion, and distributed commit, are replicated and protected by Byzantine fault tolerance mechanisms. The traditional two-phase commit protocol is extended by a Byzantine fault tolerant version that can tolerate arbitrary faults on the coordinator and the initiator sides, and some types of malicious faults on the participant side. To achieve Byzantine fault tolerance in an efficient manner, and to limit the types of malicious behaviors of the coordinator, a novel decision certificate is introduced. The decision certificate includes a signed copy of the participants\u27 vote records, and it is piggybacked with all decision notifications to the participants for each participant to verify the legitimacy of the decision. The Byzantine fault tolerance mechanisms, together with the extended two-phase commit protocol, have been incorporated into an open-source framework supporting the standard Web services atomic transactions specification. Performance characterizations of the framework show that the implementation is fairly efficient. Such a Byzantine fault tolerant coordination framework can be useful for many transactional Web services that require a high degree of security and dependabilit

    Application Aware for Byzantine Fault Tolerance

    Get PDF
    Driven by the need for higher reliability of many distributed systems, various replication-based fault tolerance technologies have been widely studied. A prominent technology is Byzantine fault tolerance (BFT). BFT can help achieve high availability and trustworthiness by ensuring replica consistency despite the presence of hardware failures and malicious faults on a small portion of the replicas. However, most state-of-the-art BFT algorithms are designed for generic stateful applications that require the total ordering of all incoming requests and the sequential execution of such requests. In this dissertation research, we recognize that a straightforward application of existing BFT algorithms is often inappropriate for many practical systems: (1) not all incoming requests must be executed sequentially according to some total order and doing so would incur unnecessary (and often prohibitively high) runtime overhead and (2) a sequential execution of all incoming requests might violate the application semantics and might result in deadlocks for some applications. In the past four and half years of my dissertation research, I have focused on designing lightweight BFT solutions for a number of Web services applications (including a shopping cart application, an event stream processing application, Web service business activities (WS-BA), and Web service atomic transactions (WS-AT)) by exploiting application semantics. The main research challenge is to identify how to minimize the use of Byzantine agreement steps and enable concurrent execution of requests that are commutable or unrelated. We have shown that the runtime overhead can be significantly reduced by adopting our lightweight solutions. One limitation for our solutions is that it requires intimate knowledge on the application design and implementation, which may be expensive and error-prone to design such BFT solutions on complex applications. Recognizing this limitation, we investigated the use of Conflict-free Replicated Data Types (CRDTs) to

    Application Aware for Byzantine Fault Tolerance

    Get PDF
    Driven by the need for higher reliability of many distributed systems, various replication-based fault tolerance technologies have been widely studied. A prominent technology is Byzantine fault tolerance (BFT). BFT can help achieve high availability and trustworthiness by ensuring replica consistency despite the presence of hardware failures and malicious faults on a small portion of the replicas. However, most state-of-the-art BFT algorithms are designed for generic stateful applications that require the total ordering of all incoming requests and the sequential execution of such requests. In this dissertation research, we recognize that a straightforward application of existing BFT algorithms is often inappropriate for many practical systems: (1) not all incoming requests must be executed sequentially according to some total order and doing so would incur unnecessary (and often prohibitively high) runtime overhead and (2) a sequential execution of all incoming requests might violate the application semantics and might result in deadlocks for some applications. In the past four and half years of my dissertation research, I have focused on designing lightweight BFT solutions for a number of Web services applications (including a shopping cart application, an event stream processing application, Web service business activities (WS-BA), and Web service atomic transactions (WS-AT)) by exploiting application semantics. The main research challenge is to identify how to minimize the use of Byzantine agreement steps and enable concurrent execution of requests that are commutable or unrelated. We have shown that the runtime overhead can be significantly reduced by adopting our lightweight solutions. One limitation for our solutions is that it requires intimate knowledge on the application design and implementation, which may be expensive and error-prone to design such BFT solutions on complex applications. Recognizing this limitation, we investigated the use of Conflict-free Replicated Data Types (CRDTs) to

    Application Aware for Byzantine Fault Tolerance

    Get PDF
    Driven by the need for higher reliability of many distributed systems, various replication-based fault tolerance technologies have been widely studied. A prominent technology is Byzantine fault tolerance (BFT). BFT can help achieve high availability and trustworthiness by ensuring replica consistency despite the presence of hardware failures and malicious faults on a small portion of the replicas. However, most state-of-the-art BFT algorithms are designed for generic stateful applications that require the total ordering of all incoming requests and the sequential execution of such requests. In this dissertation research, we recognize that a straightforward application of existing BFT algorithms is often inappropriate for many practical systems: (1) not all incoming requests must be executed sequentially according to some total order and doing so would incur unnecessary (and often prohibitively high) runtime overhead and (2) a sequential execution of all incoming requests might violate the application semantics and might result in deadlocks for some applications. In the past four and half years of my dissertation research, I have focused on designing lightweight BFT solutions for a number of Web services applications (including a shopping cart application, an event stream processing application, Web service business activities (WS-BA), and Web service atomic transactions (WS-AT)) by exploiting application semantics. The main research challenge is to identify how to minimize the use of Byzantine agreement steps and enable concurrent execution of requests that are commutable or unrelated. We have shown that the runtime overhead can be significantly reduced by adopting our lightweight solutions. One limitation for our solutions is that it requires intimate knowledge on the application design and implementation, which may be expensive and error-prone to design such BFT solutions on complex applications. Recognizing this limitation, we investigated the use of Conflict-free Replicated Data Types (CRDTs) to

    Replication and fault-tolerance in real-time systems

    Get PDF
    PhD ThesisThe increased availability of sophisticated computer hardware and the corresponding decrease in its cost has led to a widespread growth in the use of computer systems for realtime plant and process control applications. Such applications typically place very high demands upon computer control systems and the development of appropriate control software for these application areas can present a number of problems not normally encountered in other applications. First of all, real-time applications must be correct in the time domain as well as the value domain: returning results which are not only correct but also delivered on time. Further, since the potential for catastrophic failures can be high in a process or plant control environment, many real-time applications also have to meet high reliability requirements. These requirements will typically be met by means of a combination of fault avoidance and fault tolerance techniques. This thesis is intended to address some of the problems encountered in the provision of fault tolerance in real-time applications programs. Specifically,it considers the use of replication to ensure the availability of services in real-time systems. In a real-time environment, providing support for replicated services can introduce a number of problems. In particular, the scope for non-deterministic behaviour in real-time applications can be quite large and this can lead to difficultiesin maintainingconsistent internal states across the members of a replica group. To tackle this problem, a model is proposed for fault tolerant real-time objects which not only allows such objects to perform application specific recovery operations and real-time processing activities such as event handling, but which also allows objects to be replicated. The architectural support required for such replicated objects is also discussed and, to conclude, the run-time overheads associated with the use of such replicated services are considered.The Science and Engineering Research Council

    Unification of Transactions and Replication in Three-Tier Architectures Based on CORBA

    Get PDF
    In this paper, we describe a software infrastructure that unifies transactions and replication in three-tier architectures and provides data consistency and high availability for enterprise applications. The infrastructure uses transactions based on the CORBA object transaction service to protect the application data in databases on stable storage, using a roll-backward recovery strategy, and replication based on the fault tolerant CORBA standard to protect the middle-tier servers, using a roll-forward recovery strategy. The infrastructure replicates the middle-tier servers to protect the application business logic processing. In addition, it replicates the transaction coordinator, which renders the two-phase commit protocol nonblocking and, thus, avoids potentially long service disruptions caused by failure of the coordinator. The infrastructure handles the interactions between the replicated middle-tier servers and the database servers through replicated gateways that prevent duplicate requests from reaching the database servers. It implements automatic client-side failover mechanisms, which guarantee that clients know the outcome of the requests that they have made, and retries aborted transactions automatically on behalf of the clients

    Consensus on Transaction Commit

    Full text link
    The distributed transaction commit problem requires reaching agreement on whether a transaction is committed or aborted. The classic Two-Phase Commit protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms also reach agreement, but do not block whenever any majority of the processes are working. Running a Paxos consensus algorithm on the commit/abort decision of each participant yields a transaction commit protocol that uses 2F +1 coordinators and makes progress if at least F +1 of them are working. In the fault-free case, this algorithm requires one extra message delay but has the same stable-storage write delay as Two-Phase Commit. The classic Two-Phase Commit algorithm is obtained as the special F = 0 case of the general Paxos Commit algorithm.Comment: Original at http://research.microsoft.com/research/pubs/view.aspx?tr_id=70
    corecore