36 research outputs found
Multilayered Heterogeneous Parallelism Applied to Atmospheric Constituent Transport Simulation
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given
Investigation of hadron matter using lattice QCD and implementation of lattice QCD applications on heterogeneous multicore acceleration processors
Observables relevant for the understanding of the structure of baryons
were determined by means of Monte Carlo simulations of Lattice Quantum
Chromodynamics (QCD) using 2+1 dynamical quark flavours. Especial
emphasis was placed on how these observables change when flavour
symmetry is broken in comparison to choosing equal masses for the two
light and the strange quark. The first two moments of unpolarised,
longitudinally, and transversely polarised parton distribution
functions were calculated for the nucleon and hyperons. The latter are
baryons which comprise a strange quark.
Lattice QCD simulations tend to be extremely expensive, reaching the
need for petaflop computing and beyond, a regime of computing power we
just reach today. Heterogeneous multicore computing is getting
increasingly important in high performance scientific computing. The
strategy of deploying multiple types of processing elements within a
single workflow, and allowing each to perform the tasks to which it is
best suited is likely to be part of the roadmap to exascale. In this
work new design concepts were developed for an active library (QDP++)
harnessing the compute power of a heterogeneous multicore processor
(IBM PowerXCell 8i processor). Not only a proof-of-concept is given
furthermore it was possible to run a QDP++ based physics application
(Chroma) achieving a reasonable performance on the IBM BladeCenter QS22
The QPACE Supercomputer : Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics
QPACE is a massively parallel and scalable supercomputer designed to meet the requirements of applications in Lattice Quantum Chromodynamics. The project was carried out by several academic institutions in collaboration with IBM Germany and other industrial partners. In November 2009 and June 2010
QPACE was the leading architecture on the Green 500 list of the most energy efficient supercomputers in the world
Solving Hyperbolic PDEs using Accelerator Architectures
Accelerator architectures are used to accelerate the
simulation of nonlinear hyperbolic PDEs. Three different architectures, a multicore
CPU using threading, IBM’s Cell Processor, and Nvidia’s Tesla GPUs are investigated. Speed-ups of between 40-75× relative to a single CPU core in single precision are obtained using the Cell processor and the GPU. The three implementations are extended to parallel computing clusters by making use
of the Message Passing Interface (MPI). The resulting hybrid-parallel code is investigated
for performance and scalability on both a GPU and Cell computing cluster