Search CORE

3,292 research outputs found

A block-asynchronous relaxation method for graphics processing units

Author: Ashby
Bagnara
Bai
Bridges
Cappello
Chazan
Dongarra
Du
Frommer
Frommer
Hartwig Anzt
Henze
Hubbard
Jack Dongarra
Karniadakis
Kelley
Meier Yang
Powell
Saad
Stanimire Tomov
Strikwerda
Vincent Heuveline
Üeresin
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Block-Asynchronous Relaxation Method for Graphics Processing Units

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing

Author: Anzt Hartwig
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2012
Field of study

Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future

KITopen

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

Author: Anzt H.
Dongarra J.
Heuveline Vincent
Luszczek P.
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

KITopen

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Author: Fang Ye
Feng Sheng
Jarrell Mark
Moreno Juana
Ramanujam J.
Tam Ka-Ming
Yun Zhifeng
Publication venue: 'Elsevier BV'
Publication date: 21/11/2013
Field of study

Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. One of the obstacles in the Monte Carlo simulation of random frustrated systems is their long relaxation time making an efficient parallel implementation on state-of-the-art computation platforms highly desirable. The Graphics Processing Unit (GPU) is such a platform that provides an opportunity to significantly enhance the computational performance and thus gain new insight into this problem. In this paper, we present optimization and tuning approaches for the CUDA implementation of the spin glass simulation on GPUs. We discuss the integration of various design alternatives, such as GPU kernel construction with minimal communication, memory tiling, and look-up tables. We present a binary data format, Compact Asynchronous Multispin Coding (CAMSC), which provides an additional

28.4\%

speedup compared with the traditionally used Asynchronous Multispin Coding (AMSC). Our overall design sustains a performance of 33.5 picoseconds per spin flip attempt for simulating the three-dimensional Edwards-Anderson model with parallel tempering, which significantly improves the performance over existing GPU implementations.Comment: 15 pages, 18 figure

arXiv.org e-Print Archive

Louisiana State University