Skip to main content
Article thumbnail
Location of Repository

Improving quality of service in application clusters

By Sophia Corsava and Vladimir Getov


Quality of service (QoS) requirements, which include availability, integrity, performance and responsiveness are increasingly needed by science and engineering applications. Rising computational demands and data mining present a new challenge in the IT world. As our needs for more processing, research and analysis increase, performance and reliability degrade exponentially. In this paper we present a software system that manages quality of service for Unix based distributed application clusters. Our approach is synthetic and involves intelligent agents that make use of static and dynamic ontologies to monitor, diagnose and correct faults at run time, over a private network. Finally, we provide experimental results from our pilot implementation in a production environment

Topics: UOW3
Publisher: IEEE Computer Society
Year: 2003
OAI identifier:
Provided by: WestminsterResearch

Suggested articles


  1. (1994). A comparison of Techniques for Diagnosing Performance Problems
  2. A FirstPrinciples Approach to Constructing Transfer Functions for Admission Control in Computing Systems”,
  3. (2002). A new focus for a new century: availability and maintainability >> performance,” Keynote speech at USENIX FAST,
  4. (1993). A Translation Approach to Portable Ontology Specifications”,
  5. Application Architecture: An N-Tier Approach - Part 1”, from
  6. (2002). Automating data dependability”,
  7. (2002). Design and Validation of Portable Communication Infrastructure for FaultTolerant Cluster Middleware,''
  8. (1999). Fault-tolerant replication management in large-scale distributed storage systems”,
  9. (2000). Fundamental concepts of dependability,”
  10. (2003). Getov Vladimir, “Intelligent FaultTolerant architecture for cluster computing”, to appear at IASTED,
  11. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations”,
  12. (1992). LSF: load sharing in large-scale heterogeneous distributed systems”.
  13. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence”,
  14. (1999). Reasoning with cause and effect”, IJCAI Award Lecture,
  15. (2002). Reducing Recovery Time in a Small Recursively Restartable System”,
  16. (2002). Self-Healing Intelligent Infrastructure for computational clusters”,
  17. (2000). Server, release 1.3.0, Veritas Software Corporation,
  18. (2001). Sun Performance and Tuning”,
  19. SystemEdge, Sun Management Centre, TeamQuest, Landmark Performance Works, Aurora Software Sarcheck, Foglight Software RAPS, Compuware Ecotools, Datametrics Viewpoint, Metron Athene, Network Weather Service (mostly for networks)
  20. (1999). Unix Shells by example”,

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.