1,138 research outputs found
Exploring performance and power properties of modern multicore chips via simple machine models
Modern multicore chips show complex behavior with respect to performance and
power. Starting with the Intel Sandy Bridge processor, it has become possible
to directly measure the power dissipation of a CPU chip and correlate this data
with the performance properties of the running code. Going beyond a simple
bottleneck analysis, we employ the recently published Execution-Cache-Memory
(ECM) model to describe the single- and multi-core performance of streaming
kernels. The model refines the well-known roofline model, since it can predict
the scaling and the saturation behavior of bandwidth-limited loop kernels on a
multicore chip. The saturation point is especially relevant for considerations
of energy consumption. From power dissipation measurements of benchmark
programs with vastly different requirements to the hardware, we derive a
simple, phenomenological power model for the Sandy Bridge processor. Together
with the ECM model, we are able to explain many peculiarities in the
performance and power behavior of multicore processors, and derive guidelines
for energy-efficient execution of parallel programs. Finally, we show that the
ECM and power models can be successfully used to describe the scaling and power
behavior of a lattice-Boltzmann flow solver code.Comment: 23 pages, 10 figures. Typos corrected, DOI adde
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
Analytic performance models are essential for understanding the performance
characteristics of loop kernels, which consume a major part of CPU cycles in
computational science. Starting from a validated performance model one can
infer the relevant hardware bottlenecks and promising optimization
opportunities. Unfortunately, analytic performance modeling is often tedious
even for experienced developers since it requires in-depth knowledge about the
hardware and how it interacts with the software. We present the "Kerncraft"
tool, which eases the construction of analytic performance models for streaming
kernels and stencil loop nests. Starting from the loop source code, the problem
size, and a description of the underlying hardware, Kerncraft can ideally
predict the single-core performance and scaling behavior of loops on multicore
processors using the Roofline or the Execution-Cache-Memory (ECM) model. We
describe the operating principles of Kerncraft with its capabilities and
limitations, and we show how it may be used to quickly gain insights by
accelerated analytic modeling.Comment: 11 pages, 4 figures, 8 listing
On the accuracy and usefulness of analytic energy models for contemporary multicore processors
This paper presents refinements to the execution-cache-memory performance
model and a previously published power model for multicore processors. The
combination of both enables a very accurate prediction of performance and
energy consumption of contemporary multicore processors as a function of
relevant parameters such as number of active cores as well as core and Uncore
frequencies. Model validation is performed on the Sandy Bridge-EP and
Broadwell-EP microarchitectures. Production-related variations in chip quality
are demonstrated through a statistical analysis of the fit parameters obtained
on one hundred Broadwell-EP CPUs of the same model. Insights from the models
are used to explain the performance- and energy-related behavior of the
processors for scalable as well as saturating (i.e., memory-bound) codes. In
the process we demonstrate the models' capability to identify optimal operating
points with respect to highest performance, lowest energy-to-solution, and
lowest energy-delay product and identify a set of best practices for
energy-efficient execution
A review of High Performance Computing foundations for scientists
The increase of existing computational capabilities has made simulation
emerge as a third discipline of Science, lying midway between experimental and
purely theoretical branches [1, 2]. Simulation enables the evaluation of
quantities which otherwise would not be accessible, helps to improve
experiments and provides new insights on systems which are analysed [3-6].
Knowing the fundamentals of computation can be very useful for scientists, for
it can help them to improve the performance of their theoretical models and
simulations. This review includes some technical essentials that can be useful
to this end, and it is devised as a complement for researchers whose education
is focused on scientific issues and not on technological respects. In this
document we attempt to discuss the fundamentals of High Performance Computing
(HPC) [7] in a way which is easy to understand without much previous
background. We sketch the way standard computers and supercomputers work, as
well as discuss distributed computing and discuss essential aspects to take
into account when running scientific calculations in computers.Comment: 33 page
Analysis of Intel's Haswell Microarchitecture Using The ECM Model and Microbenchmarks
This paper presents an in-depth analysis of Intel's Haswell microarchitecture
for streaming loop kernels. Among the new features examined is the dual-ring
Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, core improvements
as new and improved execution units, as well as improvements throughout the
memory hierarchy. The Execution-Cache-Memory diagnostic performance model is
used together with a generic set of microbenchmarks to quantify the efficiency
of the microarchitecture. The set of microbenchmarks is chosen such that it can
serve as a blueprint for other streaming loop kernels.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0311
- …