Location of Repository

Maximizing the Availability of Distributed Software Services

By Peter Clutterbuck

Abstract

In a commercial Internet environment, the quality of service experienced by a user is critical to competitive advantage and business survivability. The availability and response time of a distributed software service are central components of the overall quality of service provided to users. Traditionally availability is a measure of service down time. Traditionally availability measures the probability that the service will be live and is expressed in terms of failure occurrence and repair or recovery time. Response time is a measure of the time taken from when the service request is made, to when service provision occurs for the user. Deteriorating response time is also a valuable indicator to denial of service attacks which continue to pose a significant threat to service availability. The concept of the service cluster is increasingly being deployed to improve service availability and response time. Cluster processor replication increases service availability. Cluster dispatching of service requests across the replicated cluster processors increases service scalability and therefore response time. This thesis commences with a review of the research and current technology in the area of distributed software service availability. The review aims to identify any deficiencies within that area and propose critical features that mitigate those deficiencies. The three critical features proposed are in relation to user wait time, cluster dispatching, and the trust-based filtering of service requests. The user wait time proposal is that the availability of a distributed service should reflect both liveness probability level and probabalistic user access time of the service. The cluster dispatching proposal is that dispatching processing overhead is a function of the number of Internet Protocol (IP) datagrams/Transport Control Protocol (TCP) segments that are received by the dispatcher in respect of each service request. Consequently the number of IP datagrams/TCP segments should be minimised ideally so that for each incoming service request there is one IP datagram/TCP segment. The trust-based filtering proposal is that the level of trust in respect of each service request should be identified by the service as this is critical in mitigating distributed denial of service attacks - and therefore maximising the availability of the service A conceptual availability model which supports the three critical features within an Internet clustered service environment is then described. The conceptual model proposes an expanded availability definition and then describes the realization of this definition via additional capabilities positioned within the Transport layer of the Internet communication environment. The additional capabilities of this model also facilitate the minimization of cluster dispatcher processing load and the identification by the cluster dispatcher of request trust level. The model is then implemented within the Linux kernel. The implementation involves the addition of several options to the existing TCP specification and also the addition of several functions to the existing Socket API. The implementation is subsequently evaluated in a dispatcher-based clustered service environment

Topics: Availability, denial of service, wait time, replication, cluster, distributed service, trust, authentication, redirection, dispatching, scheduling, filtering, option
Publisher: Queensland University of Technology
Year: 2005
OAI identifier: oai:eprints.qut.edu.au:16134

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.