Utilizing Runtime Information for Accurate Root Cause Identification in Performance Diagnosis

Abstract

This dissertation highlights that existing performance diagnostic tools often become less effective due to their inherent inaccuracies in modern software. To overcome these inaccuracies and effectively identify the root causes of performance issues, it is necessary to incorporate supplementary runtime information into these tools. Within this context, the dissertation integrates specific runtime information into two typical performance diagnostic tools: profilers and causal tracing tools. The integration yields a substantial enhancement in the effectiveness of performance diagnosis. Among these tools, gprof stands out as a representative profiler for performance diagnosis. Nonetheless, its effectiveness diminishes as the time cost calculated based on CPU sampling fails to accurately and adequately pinpoint the root causes of performance issues in complex software. To tackle this challenge, the dissertation introduces an innovative methodology called value-assisted cost profiling (vProf). This approach incorporates variable values observed during runtime into the profiling process. By continuously sampling variable values from both normal and problematic executions, vProf refines function cost estimates, identifies anomalies in value distributions, and highlights potentially problematic code areas that could be the actual sources of performance is- sues. The effectiveness of vProf is validated through the diagnosis of 18 real-world performance is- sues in four widely-used applications. Remarkably, vProf outperforms other state-of-the-art tools, successfully diagnosing all issues, including three that had remained unresolved for over four years. Causal tracing tools reveal the root causes of performance issues in complex software by generating tracing graphs. However, these graphs often suffer from inherent inaccuracies, characterized by superfluous (over-connected) and missed (under-connected) edges. These inaccuracies arise from the diversity of programming paradigms. To mitigate the inaccuracies, the dissertation proposes an approach to derive strong and weak edges in tracing graphs based on the vertices’ semantics collected during runtime. By leveraging these edge types, a beam-search-based diagnostic algorithm is employed to identify the most probable causal paths. Causal paths from normal and buggy executions are differentiated to provide key insights into the root causes of performance issues. To validate this approach, a causal tracing tool named Argus is developed and tested across multiple versions of macOS. It is evaluated on 12 well-known spinning pinwheel issues in popular macOS applications. Notably, Argus successfully diagnoses the root causes of all identified issues, including 10 issues that had remained unresolved for several years. The results from both tools exemplify a substantial enhancement of performance diagnostic tools achieved by harnessing runtime information. The integration can effectively mitigate inherent inaccuracies, lend support to inaccuracy-tolerant diagnostic algorithms, and provide key insights to pinpoint the root causes

    Similar works