Fault-tolerant TCP mechanisms

Satapati, Suresh Kumar

Fault-tolerant TCP mechanisms

Authors: Suresh Kumar Satapati
Publication date: 1 January 2000
Publisher: Texas A&M University

Abstract

Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to [email protected], referencing the URI of the item.Includes bibliographical references (leaves 45-48).Issued also on microfiche from Lange Micrographics.While fault-tolerance is supported by a variety of critical services that can be accessed over the Internet, they are not robust in that they are oblivious of the impact of their tolerant mechanisms on the service they deliver. Throughputs and fail-over latencies are the most suitable metrics for a fault-tolerant service. We propose a few fault tolerant TCP mechanisms to improve the overall throughput and provide efficient failure detection and recovery in the existing HYDRANET-FT infrastructure. Synchronizing the receiving TCP states of the server replicas, primary and backup, using TCP reassembly queue improves the overall throughput. TCP retransmissions and replica management daemons are used for failure detection and fail-over. Recovery of a host after failure involves dynamic rebuilding of TCP state to that of the existing hosts in the server group. Measurements on an experimental testbed show that our mechanisms result in a robust fault-tolerant implementation with a substantial improvement in the overall throughput for a single primary-backup system