Search CORE

6 research outputs found

Towards Loosely-Coupled Programming on Petascale Systems

Author: Beckman Pete
Clifford Ben
Foster Ian
Iskra Kamil
Raicu Ioan
Wilde Mike
Zhang Zhao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2008
Field of study

We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

arXiv.org e-Print Archive

Crossref

Achieving strong scaling with NAMD on Blue Gene/L

Author: Chao Huang
Gheorghe Almasi
Laxmikant V. Kalé
Sameer Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

NAMD is a scalable molecular dynamics application, which has demonstrated its performance on several paral-lel computer architectures. Strong scaling is necessary for molecular dynamics as problem size is fixed, and a large number of iterations need to be executed to understand in-teresting biological phenomenon. The Blue Gene/L ma-chine is a massive source of compute power. It consists of tens of thousands of embedded Power PC 440 proces-sors. In this paper, we present several techniques to scale NAMD to 8192 processors of Blue Gene/L. These include topology specific optimizations, new messaging protocols, load-balancing, and overlap of computation and communi-cation. We were able to achieve 1.2 TF of peak performance for cutoff simulations and 0.99 TF with PME.

CiteSeerX

Crossref

A system level view of Petascale I/O on IBM Blue Gene/P

Author: A Gara
F Schmuck
IBM Blue Gene team
J Moreira
Michael Hennecke
O Mextorf
P Coteus
W Frings
W Gropp
Wolfgang Frings
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Computing the fast Fourier transform on SIMD microprocessors

Author: Blake Anthony Martin
Publication venue: 'University of Waikato'
Publication date: 18/06/2012
Field of study

This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art libraries such as FFTW (“Fastest Fourier Transform in the West”), SPIRAL and Intel Integrated Performance Primitives (IPP). The conjugate-pair algorithm has advantages in terms of memory bandwidth, and three implementations of this algorithm, which incorporate latency and spatial locality optimizations, are automatically vectorized at the algorithm level of abstraction. Performance results on 2- way, 4-way and 8-way SIMD machines show that the performance scales much better than FFTW or SPIRAL. The implementations presented in this thesis are compiled into a high-performance FFT library called SFFT (“Streaming Fast Fourier Trans- form”), and benchmarked against FFTW, SPIRAL, Intel IPP and Apple Accelerate on sixteen x86 machines and two ARM NEON machines, and shown to be, in many cases, faster than these state of the art libraries, but without having to perform extensive machine specific calibration, thus demonstrating that there are good heuristics for predicting the performance of the FFT on SIMD microprocessors (i.e., the need for empirical optimization may be overstated)

Research Commons@Waikato

Blue Gene/L programming and operating environment

Author
Publication venue: 'IBM'
Publication date
Field of study

Crossref