516 research outputs found
Best practices for HPM-assisted performance engineering on modern multicore processors
Many tools and libraries employ hardware performance monitoring (HPM) on
modern processors, and using this data for performance assessment and as a
starting point for code optimizations is very popular. However, such data is
only useful if it is interpreted with care, and if the right metrics are chosen
for the right purpose. We demonstrate the sensible use of hardware performance
counters in the context of a structured performance engineering approach for
applications in computational science. Typical performance patterns and their
respective metric signatures are defined, and some of them are illustrated
using case studies. Although these generic concepts do not depend on specific
tools or environments, we restrict ourselves to modern x86-based multicore
processors and use the likwid-perfctr tool under the Linux OS.Comment: 10 pages, 2 figure
Efficient multicore-aware parallelization strategies for iterative stencil computations
Stencil computations consume a major part of runtime in many scientific
simulation codes. As prototypes for this class of algorithms we consider the
iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient
parallel implementations for cache-based multicore architectures. Temporal
cache blocking is a known advanced optimization technique, which can reduce the
pressure on the memory bus significantly. We apply and refine this optimization
for a recently presented temporal blocking strategy designed to explicitly
utilize multicore characteristics. Especially for the case of Gauss-Seidel
smoothers we show that simultaneous multi-threading (SMT) can yield substantial
performance improvements for our optimized algorithm.Comment: 15 pages, 10 figure
Building real-time embedded applications on QduinoMC: a web-connected 3D printer case study
Single Board Computers (SBCs) are now emerging
with multiple cores, ADCs, GPIOs, PWM channels, integrated
graphics, and several serial bus interfaces. The low power
consumption, small form factor and I/O interface capabilities of
SBCs with sensors and actuators makes them ideal in embedded
and real-time applications. However, most SBCs run non-realtime
operating systems based on Linux and Windows, and do
not provide a user-friendly API for application development. This
paper presents QduinoMC, a multicore extension to the popular
Arduino programming environment, which runs on the Quest
real-time operating system. QduinoMC is an extension of our earlier
single-core, real-time, multithreaded Qduino API. We show
the utility of QduinoMC by applying it to a specific application: a
web-connected 3D printer. This differs from existing 3D printers,
which run relatively simple firmware and lack operating system
support to spool multiple jobs, or interoperate with other devices
(e.g., in a print farm). We show how QduinoMC empowers devices with the capabilities to run new services without impacting their timing guarantees. While it is possible to modify existing operating systems to provide suitable timing guarantees, the effort to do so is cumbersome and does not provide the ease of programming afforded by QduinoMC.http://www.cs.bu.edu/fac/richwest/papers/rtas_2017.pdfAccepted manuscrip
- …