Search CORE

14 research outputs found

Transparent system call based performance debugging for cloud computing

Author: Jiaqi Tan
Michael P Kasick
Nikhil Khadke
Priya Narasimhan
Soila P Kavulya
Publication venue
Publication date: 01/01/2012
Field of study

Abstract Problem diagnosis and debugging in distributed environments such as the cloud and popular distributed systems frameworks has been a hard problem. We explore an evaluation of a novel way of debugging distributed systems, such as the MapReduce framework, by using system calls. Performance problems in such systems can be hard to diagnose and to localize to a specific node or a set of nodes. Additionally, most debugging systems often rely on forms of instrumentation and signatures that sometimes cannot truthfully represent the state of the system (logs or application traces for example). We focus on evaluating the performance debugging of these frameworks using a low level of abstraction -system calls. By focusing on a small set of system calls, we try to extrapolate meaningful information on the control flow and state of the framework, providing accurate and meaningful automated debugging

CiteSeerX

Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters

Author: Elmer Garduno
Jiaqi Tan
Priya Narasimhan
Rajeev G
Soila P. Kavulya
Publication venue
Publication date: 01/01/2012
Field of study

Diagnosing performance problems in large distributed systems can be daunting as the copious volume of monitoring information available can obscure the root-cause of the problem. Automated diagnosis tools help narrow down the possible root-causes—however, these tools are not perfect thereby motivating the need for visualization tools that allow users to explore their data and gain insight on the root-cause. In this paper we describe Theia, a visualization tool that analyzes application-level logs in a Hadoop cluster, and generates visual signatures of each job’s performance. These visual signatures provide compact representations of task durations, task status, and data consumption by jobs. We demonstrate the utility of Theia on real incidents experienced by users on a production Hadoop cluster.

CiteSeerX

Performance Troubleshooting in Data Centers: An Annotated Bibliography *

Author: Chengwei Wang
Jiaqi Tan
Karsten Schwan
Liting Hu
Mahendra Kutare
Mike Kasick
Priya Narasimhan
Rajeev Gandhi
Soila P Kavulya
†
Publication venue
Publication date: 06/03/2020
Field of study

CiteSeerX