19 research outputs found

    Best practices for HPM-assisted performance engineering on modern multicore processors

    Full text link
    Many tools and libraries employ hardware performance monitoring (HPM) on modern processors, and using this data for performance assessment and as a starting point for code optimizations is very popular. However, such data is only useful if it is interpreted with care, and if the right metrics are chosen for the right purpose. We demonstrate the sensible use of hardware performance counters in the context of a structured performance engineering approach for applications in computational science. Typical performance patterns and their respective metric signatures are defined, and some of them are illustrated using case studies. Although these generic concepts do not depend on specific tools or environments, we restrict ourselves to modern x86-based multicore processors and use the likwid-perfctr tool under the Linux OS.Comment: 10 pages, 2 figure

    Performance Engineering: From Numbers to Insight

    No full text

    Comparison of finite volume and lattice Boltzmann methods for multicomponent flow simulations

    Get PDF
    In pseudopotential lattice Boltzmann (LB) models for simulating multicomponent flows, interaction forces between the components of a mixture lead to phase separation and interfacial tension. At the macroscopic scale, such LB models solve an advection‐diffusion equation for each component and the Navier‐Stokes equations for the fluid mixture. In this paper, the computational efficiency of the LB method is compared with a finite volume (FV) solver for the same macroscopic‐scale equations for a binary system in a two dimensional domain. The FV implementation replicates the phase separation of the LB model. Differences in the interfacial tension are due to truncation of the Taylor series expansion of the LB interaction force in the FV version. While the computations required to update the domain for each timestep can be completed faster with the FV approach, a smaller timestep is required to achieve stability, which negates the improvement in processing speed. The FV implementation, however, allows independent variation of model parameters, which is not possible in LB. For example, the viscosity can be changed without affecting interfacial tension or the extent of phase separation. Furthermore, it is possible to obtain low interfacial tensions without suppressing phase separation with the FV formulation. The significance of changing the diffusion rate of components on the deformation of a droplet in shear is also demonstrated. For three‐dimensional simulations, the finite volume approach is expected to be faster than LB and would benefit from the demonstrated flexibility in specifying model parameters

    Application instrumentation for performance analysis and tuning with focus on energy efficiency

    No full text
    Profiling and tuning of parallel applications is an essential part of HPC. Analysis and elimination of application hot spots can be performed using many available tools, which also provides resource consumption measurements for instrumented parts of the code. Since complex applications show different behavior in each part of the code, it is essential to be able to insert instrumentation to analyse these parts. Because each performance analysis or autotuning tool can bring different insights into an application behavior, it is valuable to analyze and optimize an application using a variety of them. We present our on request inserted shared C/C++ API for the most common open-source HPC performance analysis tools, which simplify the process of the manual instrumentation. Besides manual instrumentation, profiling libraries provide different methods for instrumentation. Of these, the binary patching is the most universal mechanism, and highly improves the user-friendliness and robustness of the tool. We provide an overview of the most commonly used binary patching tools, and describe a workflow for how to use them to implement a binary instrumentation tool for any profiler or autotuner. We have also evaluated the minimum overhead of the manual and binary instrumentation.Web of Scienc

    Exploiting SIMD and Thread-Level Parallelism in Multiblock CFD

    No full text

    A Predictive Performance Model for Stencil Codes on Multicore CPUs

    No full text
    corecore