14 research outputs found

    Transparent system call based performance debugging for cloud computing

    Get PDF
    Abstract Problem diagnosis and debugging in distributed environments such as the cloud and popular distributed systems frameworks has been a hard problem. We explore an evaluation of a novel way of debugging distributed systems, such as the MapReduce framework, by using system calls. Performance problems in such systems can be hard to diagnose and to localize to a specific node or a set of nodes. Additionally, most debugging systems often rely on forms of instrumentation and signatures that sometimes cannot truthfully represent the state of the system (logs or application traces for example). We focus on evaluating the performance debugging of these frameworks using a low level of abstraction -system calls. By focusing on a small set of system calls, we try to extrapolate meaningful information on the control flow and state of the framework, providing accurate and meaningful automated debugging

    Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters

    No full text
    Diagnosing performance problems in large distributed systems can be daunting as the copious volume of monitoring information available can obscure the root-cause of the problem. Automated diagnosis tools help narrow down the possible root-causes—however, these tools are not perfect thereby motivating the need for visualization tools that allow users to explore their data and gain insight on the root-cause. In this paper we describe Theia, a visualization tool that analyzes application-level logs in a Hadoop cluster, and generates visual signatures of each job’s performance. These visual signatures provide compact representations of task durations, task status, and data consumption by jobs. We demonstrate the utility of Theia on real incidents experienced by users on a production Hadoop cluster.
    corecore