Search CORE

2,707 research outputs found

On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance

Author: Richard Murphy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Since the first vector supercomputers in the mid-1970’s, the largest scale applications have traditionally been float-ing point oriented numerical codes, which can be broadly characterized as the simulation of physics on a computer. Supercomputer architectures have evolved to meet the needs of those applications. Specifically, the computational work of the application tends to be floating point oriented, and the decomposition of the problem two or three dimensional. To-day, an emerging class of critical applications may change those assumptions: they are combinatorial in nature, in-teger oriented, and irregular. The performance of both classes of applications is dominated by the performance of the memory system. This paper compares the memory per-formance sensitivity of both traditional and emerging HPC applications, and shows that the new codes are significantly more sensitive to memory latency and bandwidth than their traditional counterparts. Additionally, these codes exhibit lower base-line performance, which only exacerbates the problem. As a result, the construction of future supercom-puter architectures to support these applications will most likely be different from those used to support traditional codes. Quantitatively understanding the difference between the two workloads will form the basis for future design choices. 1

CiteSeerX

Crossref

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Author: Gioiosa Roberto
Kestor Gokcen
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/04/2017
Field of study

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide 5x higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3x performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM

arXiv.org e-Print Archive

Crossref

Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems

Author: Alessia Gualandris
Alfredo Tirado-Ramos
Cohn
Dorband
Foster
Foster
Giersz
Louis
Makino
Makino
Makino
Makino
Makino
Murphy
Plummer
Simon Portegies Zwart
Publication venue: 'Elsevier BV'
Publication date: 09/12/2004
Field of study

We discuss the performance of direct summation codes used in the simulation of astrophysical stellar systems on highly distributed architectures. These codes compute the gravitational interaction among stars in an exact way and have an O(N^2) scaling with the number of particles. They can be applied to a variety of astrophysical problems, like the evolution of star clusters, the dynamics of black holes, the formation of planetary systems, and cosmological simulations. The simulation of realistic star clusters with sufficiently high accuracy cannot be performed on a single workstation but may be possible on parallel computers or grids. We have implemented two parallel schemes for a direct N-body code and we study their performance on general purpose parallel computers and large computational grids. We present the results of timing analyzes conducted on the different architectures and compare them with the predictions from theoretical models. We conclude that the simulation of star clusters with up to a million particles will be possible on large distributed computers in the next decade. Simulating entire galaxies however will in addition require new hybrid methods to speedup the calculation.Comment: 22 pages, 8 figures, accepted for publication in Parallel Computin

arXiv.org e-Print Archive

Crossref

Surrey Research Insight

International Migration, Integration and Social Cohesion online publications

Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

Author: Anderson Thomas E.
Katz Randy H.
Ousterhout John K.
Patterson David A.
Publication venue
Publication date
Field of study

Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

NASA Technical Reports Server