Identifying performance bottlenecks and their associated calling contexts is critical for tuning high-performance applications. This thesis presents a new approach to measuring resource utilization and its calling context. Previous instrumentation-based approaches for reporting calling context introduce overhead proportional to the number of function calls performed. We describe a new design for a call path profiler based on stack sampling. Our design enables profiling of unmodified binaries, provides low and controllable overhead, and accurately attributes context-dependent costs of calls. We use a special trampoline function that improves the efficiency of stack sampling and enables the association of unique invocation counts with sampled call sites. We evaluate a Tru64/Alpha implementation of our design and show that on call-intensive codes, the overhead of our approach is over two orders of magnitude lower than the overhead of an instrumentation-based approach, with comparable overhead on other codes
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.