3,391 research outputs found
Validation of hardware events for successful performance pattern identification in High Performance Computing
Hardware performance monitoring (HPM) is a crucial ingredient of performance
analysis tools. While there are interfaces like LIKWID, PAPI or the kernel
interface perf\_event which provide HPM access with some additional features,
many higher level tools combine event counts with results retrieved from other
sources like function call traces to derive (semi-)automatic performance
advice. However, although HPM is available for x86 systems since the early 90s,
only a small subset of the HPM features is used in practice. Performance
patterns provide a more comprehensive approach, enabling the identification of
various performance-limiting effects. Patterns address issues like bandwidth
saturation, load imbalance, non-local data access in ccNUMA systems, or false
sharing of cache lines. This work defines HPM event sets that are best suited
to identify a selection of performance patterns on the Intel Haswell processor.
We validate the chosen event sets for accuracy in order to arrive at a reliable
pattern detection mechanism and point out shortcomings that cannot be easily
circumvented due to bugs or limitations in the hardware
LIKWID: Lightweight Performance Tools
Exploiting the performance of today's microprocessors requires intimate
knowledge of the microarchitecture as well as an awareness of the ever-growing
complexity in thread and cache topology. LIKWID is a set of command line
utilities that addresses four key problems: Probing the thread and cache
topology of a shared-memory node, enforcing thread-core affinity on a program,
measuring performance counter metrics, and microbenchmarking for reliable upper
performance bounds. Moreover, it includes a mpirun wrapper allowing for
portable thread-core affinity in MPI and hybrid MPI/threaded applications. To
demonstrate the capabilities of the tool set we show the influence of thread
affinity on performance using the well-known OpenMP STREAM triad benchmark, use
hardware counter tools to study the performance of a stencil code, and finally
show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page
Bio-inspired call-stack reconstruction for performance analysis
The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.We thankfully acknowledge Mathis Bode for giving us access to the Arts CF binaries, and Miguel Castrillo and Kim Serradell for their valuable insight regarding Nemo. We would like to thank Forschungszentrum Jülich for the computation time on their Blue Gene/Q system. This research has been partially funded by the CICYT under contracts No. TIN2012-34557 and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
MERIC and RADAR generator: tools for energy evaluation and runtime tuning of HPC applications
This paper introduces two tools for manual energy evaluation and runtime tuning developed at IT4Innovations in the READEX project. The MERIC library can be used for manual instrumentation and analysis of any application from the energy and time consumption point of view. Besides tracing, MERIC can also change environment and hardware parameters during the application runtime, which leads to energy savings.
MERIC stores large amounts of data, which are difficult to read by a human. The RADAR generator analyses the MERIC output files to find the best settings of evaluated parameters for each instrumented region. It generates a Open image in new window report and a MERIC configuration file for application production runs
Applying constraint solving to the management of distributed applications
Submitted to DOA08We present our approach for deploying and managing distributed component-based applications. A Desired State Description (DSD), written in a high-level declarative language, specifies requirements for a distributed application. Our infrastructure accepts a DSD as input, and from it automatically configures and deploys the distributed application. Subsequent violations of the original requirements are detected and, where possible, automatically rectified by reconfiguration and redeployment of the necessary application components. A constraint solving tool is used to plan deployments that meet the application requirements.Postprin
- …