447 research outputs found
Exploring the Performance Benefit of Hybrid Memory System on HPC Environments
Hardware accelerators have become a de-facto standard to achieve high
performance on current supercomputers and there are indications that this trend
will increase in the future. Modern accelerators feature high-bandwidth memory
next to the computing cores. For example, the Intel Knights Landing (KNL)
processor is equipped with 16 GB of high-bandwidth memory (HBM) that works
together with conventional DRAM memory. Theoretically, HBM can provide 5x
higher bandwidth than conventional DRAM. However, many factors impact the
effective performance achieved by applications, including the application
memory access pattern, the problem size, the threading level and the actual
memory configuration. In this paper, we analyze the Intel KNL system and
quantify the impact of the most important factors on the application
performance by using a set of applications that are representative of
scientific and data-analytics workloads. Our results show that applications
with regular memory access benefit from MCDRAM, achieving up to 3x performance
when compared to the performance obtained using only DRAM. On the contrary,
applications with random memory access pattern are latency-bound and may suffer
from performance degradation when using only MCDRAM. For those applications,
the use of additional hardware threads may help hide latency and achieve higher
aggregated bandwidth when using HBM
Potential of I/O aware workflows in climate and weather
The efficient, convenient, and robust execution of data-driven workflows and enhanced data
management are essential for productivity in scientific computing. In HPC, the concerns of storage
and computing are traditionally separated and optimised independently from each other and the
needs of the end-to-end user. However, in complex workflows, this is becoming problematic. These
problems are particularly acute in climate and weather workflows, which as well as becoming
increasingly complex and exploiting deep storage hierarchies, can involve multiple data centres.
The key contributions of this paper are: 1) A sketch of a vision for an integrated data-driven
approach, with a discussion of the associated challenges and implications, and 2) An architecture
and roadmap consistent with this vision that would allow a seamless integration into current
climate and weather workflows as it utilises versions of existing tools (ESDM, Cylc, XIOS, and
DDN’s IME).
The vision proposed here is built on the belief that workflows composed of data, computing, and communication-intensive tasks should drive interfaces and hardware configurations to
better support the programming models. When delivered, this work will increase the opportunity for smarter scheduling of computing by considering storage in heterogeneous storage systems.
We illustrate the performance-impact on an example workload using a model built on measured
performance data using ESDM at DKRZ
HPC benchmarking: scaling right and looking beyond the average
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-96983-1_10Designing a balanced HPC system requires an understanding of the dominant performance bottlenecks. There is as yet no well established methodology for a unified evaluation of HPC systems and workloads that quantifies the main performance bottlenecks. In this paper, we execute seven production HPC applications on a production HPC platform, and analyse the key performance bottlenecks: FLOPS performance and memory bandwidth congestion, and the implications on scaling out. We show that the results depend significantly on the number of execution processes and granularity of measurements. We therefore advocate for guidance in the application suites, on selecting the representative scale of the experiments. Also, we propose that the FLOPS performance and memory bandwidth should be represented in terms of the proportions of time with low, moderate and severe utilization. We show that this gives much more precise and actionable evidence than the average.This work was supported by the Spanish Ministry of Science and Technology
(project TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051
and 2014-SGR-1272), Severo Ochoa Programme (SEV-2015-0493) of the Spanish
Government; and the European Union's Horizon 2020 research and innovation
programme under ExaNoDe project (grant agreement No 671578).Peer ReviewedPostprint (author's final draft
Design trade-offs for emerging HPC processors based on mobile market technology
This is a post-peer-review, pre-copyedit version of an article published in The Journal of Supercomputing. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11227-019-02819-4High-performance computing (HPC) is at the crossroads of a potential transition toward mobile market processor technology. Unlike in prior transitions, numerous hardware vendors and integrators will have access to state-of-the-art processor designs due to Arm’s licensing business model. This fact gives them greater flexibility to implement custom HPC-specific designs. In this paper, we undertake a study to quantify the different energy-performance trade-offs when architecting a processor based on mobile market technology. Through detailed simulations over a representative set of benchmarks, our results show that: (i) a modest amount of last-level cache per core is sufficient, leading to significant power and area savings; (ii) in-order cores offer favorable trade-offs when compared to out-of-order cores for a wide range of benchmarks; and (iii) heterogeneous configurations help to improve processor performance and energy efficiency.Peer ReviewedPostprint (author's final draft
- …