Solaris Service Management Facility: Modern System Startup and Administration

Abstract

Application uptime is critical to every administrator. The factors which cause system downtime are often handled by the operating system, but causes for application faults (e.g., software bugs, hardware faults, or human errors) are not addressed by standard system software. Recovery is left to humans, who may often compound the problem due to misdiagnosis or simple error. While availability issues have traditionally been addressed by expensive high-availability clustering solutions, the increasing complexity of software stacks requires a solution for all systems. In addition to the challenges of managing availability of higher level software, the modern operating system itself is composed of many interdependent software entities. A failure in any one of these components often cascades, causing failures in other components. A complex software model with many interdependent elements makes diagnosing failures very challenging for system administrators. The traditional init.d script mechanisms for UNIX are only a weak reflection of the intricate dependency relationships which exist on every system. We introduce the Service Management Facility (SMF) as a comprehensive way to describe, execute, and manage software services. SMF promotes the service to a first-class operating system entity, without requiring modification of application binaries or changes to the UNIX process model. It relieves the administrator from duties of application failure detection and restart, and provides sophisticated diagnosis tools when automatic repair is impossible

    Similar works

    Full text

    thumbnail-image

    Available Versions