Performance Debugging in Data Centers: Doing More with Less

Abstract

Abstract — With the increasing scale and complexity of data centers, detecting and localizing performance faults in real-time has become both a pressing need and a challenge. While several approaches for performance debugging in data centers have been proposed, these techniques do not assume any constraints on the availability of operational data needed to detect and localize faults. We argue that collecting such operational data often requires significant instrumentation or intrusiveness, which is difficult to realize in production data centers. Such constraints complicate the deployment of existing techniques or limit their effectiveness in practice. In this paper, we argue that for performance debugging to become practical and effective in realworld systems, one needs to develop techniques that are “more effective ” with “less instrumentation and intrusiveness”. We raise several issues and challenges in realizing this vision and present some initial ideas on addressing these challenges. Index Terms—data centers, performance debugging, fault detection and localization, operating and distributed systems. I

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 05/06/2019