23 research outputs found
Best practices for HPM-assisted performance engineering on modern multicore processors
Many tools and libraries employ hardware performance monitoring (HPM) on
modern processors, and using this data for performance assessment and as a
starting point for code optimizations is very popular. However, such data is
only useful if it is interpreted with care, and if the right metrics are chosen
for the right purpose. We demonstrate the sensible use of hardware performance
counters in the context of a structured performance engineering approach for
applications in computational science. Typical performance patterns and their
respective metric signatures are defined, and some of them are illustrated
using case studies. Although these generic concepts do not depend on specific
tools or environments, we restrict ourselves to modern x86-based multicore
processors and use the likwid-perfctr tool under the Linux OS.Comment: 10 pages, 2 figure
Eigen-AD: Algorithmic Differentiation of the Eigen Library
In this work we present useful techniques and possible enhancements when
applying an Algorithmic Differentiation (AD) tool to the linear algebra library
Eigen using our in-house AD by overloading (AD-O) tool dco/c++ as a case study.
After outlining performance and feasibility issues when calculating derivatives
for the official Eigen release, we propose Eigen-AD, which enables different
optimization options for an AD-O tool by providing add-on modules for Eigen.
The range of features includes a better handling of expression templates for
general performance improvements, as well as implementations of symbolically
derived expressions for calculating derivatives of certain core operations. The
software design allows an AD-O tool to provide specializations to automatically
include symbolic operations and thereby keep the look and feel of plain AD by
overloading. As a showcase, dco/c++ is provided with such a module and its
significant performance improvements are validated by benchmarks.Comment: Updated with accepted version for ICCS 2020 conference proceedings.
The final authenticated publication is available online at
https://doi.org/10.1007/978-3-030-50371-0_51. See v1 for the original,
extended preprint. 14 pages, 7 figure
Scalable Simulation of Realistic Volume Fraction Red Blood Cell Flows through Vascular Networks
High-resolution blood flow simulations have potential for developing better
understanding biophysical phenomena at the microscale, such as vasodilation,
vasoconstriction and overall vascular resistance. To this end, we present a
scalable platform for the simulation of red blood cell (RBC) flows through
complex capillaries by modeling the physical system as a viscous fluid with
immersed deformable particles. We describe a parallel boundary integral
equation solver for general elliptic partial differential equations, which we
apply to Stokes flow through blood vessels. We also detail a parallel collision
avoiding algorithm to ensure RBCs and the blood vessel remain contact-free. We
have scaled our code on Stampede2 at the Texas Advanced Computing Center up to
34,816 cores. Our largest simulation enforces a contact-free state between four
billion surface elements and solves for three billion degrees of freedom on one
million RBCs and a blood vessel composed from two million patches