To efficiently exploit the resources of new many-core architectures,
integrating dozens or even hundreds of cores per chip, parallel programming
models have evolved to expose massive amounts of parallelism, often in the form
of fine-grained tasks. Task-parallel languages, such as OpenStream, X10,
Habanero Java and C or StarSs, simplify the development of applications for new
architectures, but tuning task-parallel applications remains a major challenge.
Performance bottlenecks can occur at any level of the implementation, from the
algorithmic level (e.g., lack of parallelism or over-synchronization), to
interactions with the operating and runtime systems (e.g., data placement on
NUMA architectures), to inefficient use of the hardware (e.g., frequent cache
misses or misaligned memory accesses); detecting such issues and determining
the exact cause is a difficult task.
In previous work, we developed Aftermath, an interactive tool for trace-based
performance analysis and debugging of task-parallel programs and run-time
systems. In contrast to other trace-based analysis tools, such as Paraver or
Vampir, Aftermath offers native support for tasks, i.e., visualization,
statistics and analysis tools adapted for performance debugging at task
granularity. However, the tool currently does not provide support for the
automatic detection of performance bottlenecks and it is up to the user to
investigate the relevant aspects of program execution by focusing the
inspection on specific slices of a trace file. In this paper, we present
ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in
Multi-Core Computing (Racing 2014) (arXiv:1405.2281